Emergent Neural Automaton Policies:
Learning Symbolic Structure from Visuomotor Trajectories

Yiyuan Pan^*, Xusheng Luo^*, Hanjiang Hu, Peiqi Yu, Changliu Liu

Robotics Institute, Carnegie Mellon University

* Equal contribution

ENAP enables unsupervised discovery of task structures, improving both interpretability and task performance.

Abstract

Scaling robot learning to long-horizon tasks remains a formidable challenge. While end-to-end policies often lack the structural priors needed for effective long-term reasoning, traditional neuro-symbolic methods rely heavily on hand-crafted symbolic priors. To address the issue, we introduce ENAP (Emergent Neural Automaton Policy), a framework that allows a bi-level neuro-symbolic policy to adaptively emerge from demonstrations. Specifically, we first employ adaptive clustering and an extension of the \(L^*\) algorithm to infer a Mealy state machine from visuomotor data, which serves as an interpretable high-level planner capturing latent task modes. Then, this discrete structure guides a low-level reactive residual network to learn precise continuous control via behavior cloning. By explicitly modeling the task policy with discrete transitions and continuous residuals, ENAP achieves high sample efficiency and interpretability without requiring task-specific labels. Extensive experiments on complex manipulation and long-horizon tasks demonstrate that ENAP outperforms state-of-the-art end-to-end VLA policies by up to 27% in low-data regimes, while offering a structured representation of robotic intent.

Method Overview ▶ Replay

ENAP follows a three-stage pipeline—(i) symbol abstraction, (ii) structure extraction via an extended \(L^*\), and (iii) bi-level control—to learn structured policies from demonstrations.

Inference Pipeline ▶ Replay

ENAP resolves multi-modal decisions by leveraging a learned state machine and observation-conditioned residual control to guide transitions into the correct logical branch.

Comparison with Sota

Complex Manipulation Tasks

Method	Param (M)	DualStack	Peg
Method	Param (M)	Cube (%)	Insert (%)
Oracle	2.98	98.3 ±0.4	86.7 ±0.8
Transformer	63.81	38.7 ±6.0	51.8 ±5.5
GMM	46.11	73.6 ±2.3	53.1 ±2.6
Diffusion Policy	114.39	41.2 ±7.2	31.1 ±6.8
OpenVLA	7652.10	69.8 ±2.0	42.3 ±2.8
\(\pi_0\)	3288.52	73.4 ±1.2	51.6 ±1.4
ENAP (Oracle)	2.66	98.8 ±0.3	85.6 ±0.6
ENAP* (DINO)	22.94	76.0 ±2.0	63.2 ±2.4

Long-Horizon TAMP Tasks

Method	Seq. (%)		Hier. (%)
Method	3/5	5/5	3/5	5/5
FLOWER	91.0 ±0.6	90.6 ±0.5	90.8 ±0.7	15.9 ±0.4
ENAP (FLOWER)	97.0 ±0.4	96.8 ±0.3	95.5 ±0.5	28.2 ±0.6

Real-World Evaluation

Method	Param (M)	Speed (ms)	Stack	Pick	Hanger
Method	Param (M)	Speed (ms)	Lego	Place	Task
\(\pi_{0.5}\)	3403	6841	58.82	76.47	64.71
ENAP* (DINO)	23	281	88.24	94.12	94.12

Qualitative Results

Real-time PMM Transition

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

StackLego is a high-precision assembly task where a blue brick must be placed onto a fixed red brick without force feedback, evaluated by graded stacking success.

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Hanger is a manipulation task requiring the agent to unhook a hanger and transfer it across an obstacle to the opposite side of a rack.

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

MultiGoalPickPlace is a sorting task where the agent must match and place multiple colored cans into their corresponding bowls from randomized initial positions.

Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories

Emergent Neural Automaton Policies:
Learning Symbolic Structure from Visuomotor Trajectories