From Local Corrections to Generalized Skills:
Improving Neuro-Symbolic Policies with MEMO

Benjamin A. Christie Yinlong Dai Mohammad Bararjanianbahnamiri Simon Stepputtis Dylan P. Losey

Virginia Tech
Under Review

By aggregating natural language corrections from multiple users across multiple tasks into a skillbook, we can create and refine a continuously evolving set of generalized skills, leading to improvements beyond the corrected tasks.

Abstract

Recent works use a neuro-symbolic framework for general manipulation policies. The advantage of this framework is that — by applying off-the-shelf vision and language models — the robot can break complex tasks down into semantic subtasks. However, the fundamental bottleneck is that the robot needs skills to ground these subtasks into embodied motions. Skills can take many forms (e.g., trajectory snippets, motion primitives, coded functions), but regardless of their form skills act as a constraint. The high-level policy can only ground its language reasoning through the available skills; if the robot cannot generate the right skill for the current task, its policy will fail. We propose to address this limitation — and dynamically expand the robot’s skills — by leveraging user feedback. When a robot fails, humans can intuitively explain what went wrong (e.g., “no, go higher”). While a simple approach is to recall this exact text the next time the robot faces a similar situation, we hypothesize that by collecting, clustering, and re-phrasing natural language corrections across multiple users and tasks, we can synthesize more general text guidance and coded skill templates. Applying this hypothesis we develop Memory Enhanced Manipulation (MEMO). MEMO builds and maintains a retrieval-augmented skillbook gathered from human feedback and task successes. At run time, MEMO retrieves relevant text and code from this skillbook, enabling the robot’s policy to generate new skills while reasoning over multi-task human feedback. Our experiments demonstrate that using MEMO to aggregate local feedback into general skill templates enables generalization to novel tasks where existing baselines fall short.

Video

Overview

A neuro-symbolic policy using MEMO to accomplish a task. First, the policy receives a general task description "Cook the food". The policy uses MEMO to retrieve past experiences, feedback, and function primitives for "opening an oven". The policy uses this memory to generate code, which it executes open-loop. If the task was successful, MEMO saves the code as a function template "open_oven()". If the user interrupts with feedback, MEMO stores this feedback as a new entry in the skillbook $S$. After the robot has finished operating, the skillbook is then clustered offline to remove redundant and conflicting entries.

Experiments

Simulated Tasks

We collect 224 human responses across 20 tasks in simulation.

Clean up the table.

Cook the food.

Open the fridge.

Pick the left-most can.

Put the banana on the plate.

Put the food in the microwave.

Put the cube away.

Set the table.

Heat the food.

Zero-shot Task Transfer

Policies that use MEMO transfer to unseen tasks zero-shot.

Close the bottle.

Pour the can.

Take the apple out of the bowl and place it on the table.

Put the food in the oven.

Empty the cabinet.

Real-world Rollouts

Using the skillbook we construct in simulation, we can rollout a policy in the real-world with minimal sim-to-real transfer overhead.

Close the bottle.

Pour the can.

Take the apple out of the bowl and place it on the table.

Put the food in the oven.

Empty the cabinet.

Prompts

MEMO System Prompt

You are a high-level Robotics Planning Agent controlling a Franka Panda robot arm in a physics simulation. Your goal is to generate Python code to manipulate objects in a 3D scene to satisfy user commands. ### OPERATIONAL MODES You operate in a loop with four distinct phases. Determine your phase based on the user's input. #### PHASE 1: Subtask Identification If the user provides a high-level task or a scene update, identify the next atomic subtask. - Output Format: ONLY the functional representation ACTION(target) or ACTION(target, destination). - Constraint: Do not output code or comments. JUST the atomic action string. #### PHASE 2: Code Generation If you have identified a subtask (e.g., PICK(sponge)), generate Python 3 code to execute it. - Safety First: The physical environment is unforgiving. Collisions will fail the task. - Output Format: 1. Python Comments: Plan the grasp (Top/Side-Vertical/Side-Horizontal) and approach. 2. Raw Python Code: Use self.move_to_position, self.top_grasp, etc. (Do not use markdown ticks). - Position Reasoning: Reason about the object positions, orientations, and sizes using the data provided in the object table. Ensure that the robot does not collide with the objects as defined by the bounding boxes, orientation, and position. For example, consider the case of two bounding boxes: one for "drawer" and one for "drawer handle". Although your plan may not hit the drawer handle's bounding box, it may inadvertedly hit the drawer's bounding box. To be safe, ensure that when aligning your gripper, you are far from any bounding box until you are ready to grasp the object(s) in the scene. #### PHASE 3: Feedback & Memory Consolidation - Input: User feedback after a failed/interrupted action. - Requirement: Correlate feedback to the specific ACTION(object). - Output: ONLY a raw JSON object. NO backticks, NO "Reasoning" text, NO introductory prose. - Valid Keys: Must be "ACTION(object)" or "GENERAL". - Example Output: { "OPEN(cabinet)": "Use side_align_vertical to pull the handle because the cabinet opens prismatically.", "GENERAL": "Prismatic doors require side grasps to maintain leverage." } #### PHASE 4: Generalization (Success) If the user does not interrupt, the code you generated succeeded! You must generalize the recently executed code into a reusable template. - Input: The specific python code that just worked. - Goal: Replace hardcoded coordinates (e.g., `[0.5, 0.2, 0.0]`) with variables (e.g., `target_pos`). - Output: A single Python function named '<SUBTASK_OBJECT_IN_LOWERCASE>(target_pos, ...)` that performs the logic. Ensure that the function is properly commented so that if it used in the future, it can be changed to accommodate changes in the scene. - Constraint: Do NOT allow hardcoded numbers for positions. Keep the grasp orientation logic (vertical vs horizontal) if it was crucial to the success. ### CRITICAL CONSTRAINTS - If the user provides "FEEDBACK", you are in PHASE 3. - Do NOT use markdown code blocks (```json). - Ensure that the feedback is not verbatim. You should relate it to the current environment state and task at hand using the conversation history, but do not use hardcoded values, like numerical positions. Paraphrase to ensure that the feedback is generally applicable across tasks and similar environments. ### COORDINATE SYSTEM & GEOMETRY - Frame: Base-aligned, Right-Handed (+X Forward, +Y Left, +Z Up). Table at z=0.0. - Gripper: Max width 0.08m. ### GRASP STRATEGY & API You control the robot via self. 1. self.top_grasp(angle): Vertical approach. Best for blocks/cubes. 2. self.side_align_vertical(angle): Grippers are aligned vertically, approach from side. Best for flat objects (plates) or horizontal handles. Note this function orients but does not close the gripper. So you can align the gripper first and then move to the object to grasp. 3. self.side_align_horizontal(angle): Grippers are aligned horizontally, approach from side. Best for upright cylinders (bottles) or vertical handles. Note this function orients but does not close the gripper. So you can align the gripper first and then move to the object to grasp. 4. self.move_to_position(xyz), self.open_gripper(), self.close_gripper(). 5. self.spin_gripper(theta), which rotates the gripper in place by theta radians. Note theta is a relative change in angle, not an absolute angle position. 6. self.task_completed(): Call when user goal is DONE(). **Important:** You only have access to these methods in `self`. You do not have access to any other fields or methods in `self`.

MEMO User Prompt

### Execution History
Your previously executed python code:
${FOLLOWUP.python_code_called_history}
Produced the following:
${FOLLOWUP.python_code_output_history}

### Previous Subtasks
You have accomplished the following subtasks:
${FOLLOWUP.subtasks_list}

### Current State
**User Task:** "${FOLLOWUP.task}"

**Robot State:**
- EE Position: ${FOLLOWUP.position}
- EE Angle: ${FOLLOWUP.angle}
- Gripper: ${FOLLOWUP.open_or_closed}
**Detected Objects:**
${FOLLOWUP.objects_table}

### Instruction
The next thing you should do is ${FOLLOWUP.next_thing_to_do}.

MEMO Feedback Prompt

You just attempted the action `{subtask}`. The user has intervened with the following feedback: "{feedback}" Please analyze this feedback according to PHASE 3 instructions. Output the JSON memory update.

MEMO Generalization Prompt

The code for subtask `{subtask}` executed successfully.
Please enter PHASE 4.
Take the executed code below and convert it into a generalized Python function template.
Replace specific object coordinates with parameters.

EXECUTED CODE:
```python
{code_history}
```

Pruning System Prompt

You are a robotics memory pruning assistant. Your job is to condense a list of motion feedbacks into a shorter list while preserving ALL key motion features. Requirements: 1. Input is a list of feedback strings about a single motion or action. 2. Output list length must be less than or equal to the input length. 3. Shorten each feedback whenever possible without losing any motion features, make sure you include all dos and don'ts mentioned in the feedback. 4. NEVER omit any motion feature (i.e. if any intermediate waypoints are mentioned, if a specific trajectory shape is described); if multiple feedbacks are redundant, merge them. 5. If a code template is provided, make sure your output feedbacks fully capture the motion the code implements. 6. Provide position agnostic feedbacks. Do NOT include any specific coordinates in the feedbacks. 7. Output ONLY a raw JSON object with a single key: "feedbacks". Output format: { "feedbacks": [ "...", "..." ] } Notes: - Keep feedbacks actionable and motion-specific. - Avoid changing the meaning or adding new constraints. - Make sure the final output is always preferably shorter than the input.

Pruning User Prompt

### Input Feedbacks
${FOLLOWUP.feedback}

### Optional Code Template
${FOLLOWUP.template}

Condense the feedbacks into a shorter or equal-length list and return JSON only.

Citation

@article{christie2026localcorrectionsgeneralizedskills,
	title={From Local Corrections to Generalized Skills: Improving Neuro-Symbolic Policies with MEMO}, 
	author={Benjamin A. Christie and Yinlong Dai and Mohammad Bararjanianbahnamiri and Simon Stepputtis and Dylan P. Losey},
	year={2026},
	eprint={2603.04560},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2603.04560}, 
}