Visuomotor policies aim to learn complex manipulation tasks from expert demonstrations. However, generating smooth and coherent trajectories remains challenging, as it requires balancing proximal precision with distal foresight. Existing approaches typically focus on optimizing intra-chunk action distributions, often neglecting the inter-chunk coherence. Consequently, inter-chunk discontinuities significantly impede the learning of coherent long-horizon actions.
To overcome this limitation and achieve a synergetic balance between precision and foresight, we propose FocalPolicy, a foresight-aware visuomotor policy that combines Frequency-Optimized Chunking with Locally Anchored flow matching. We introduce a foresight composite objective that supervises time-domain alignment within the proximal actions while regularizing frequency-domain structure over multiple future action chunks to improve cross-chunk coherence. To efficiently learn complex action distributions, we design locally anchored sampling to enhance target signal propagation efficiency during consistency flow matching training. Extensive experiments demonstrate that our method consistently outperforms existing approaches.
The pipeline of FocalPolicy. We propose Locally Anchored Sampling (LAS) to improve the training efficiency of consistency flow matching. The policy is optimized via a Foresight Composite Objective (FCO), which synergizes proximal precision with distal coherence.
Demonstrating stable long-horizon execution on multi-stage tasks like Tower Stacking and Cup Matching.
@inproceedings{he2026focalpolicy,
title={FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy},
author={He, Qian and Yang, Zhenshuo and Liang, Wenqi and Hao, Chunhui and Sebe, Nicu and Tian, Jiandong},
booktitle={Proceedings of the 43rd International Conference on Machine Learning},
year={2026}
}