Abstract: This paper addresses a challenging interactive task learning scenario we call rearrangement under unawareness:
an agent must manipulate a rigid-body environment without knowing a key concept necessary for solving the task and must learn about it during deployment.
For example, the user may ask to "put the two granny smith apples inside the basket", but the agent cannot correctly identify which objects in the
environment are "granny smith" as the agent has not been exposed to such a concept before. We introduce SECURE, an interactive task learning policy
designed to tackle such scenarios. The unique feature of SECURE is its ability to enable agents to engage in semantic analysis when processing embodied
conversations and making decisions. Through embodied conversation, a SECURE agent adjusts its deficient domain model by engaging in dialogue to identify
and learn about previously unforeseen possibilities. The SECURE agent learns from the user's embodied corrective feedback when mistakes are made and
strategically engages in dialogue to uncover useful information about novel concepts relevant to the task. These capabilities enable the SECURE
agent to generalize to new tasks with the acquired knowledge. We demonstrate in the simulated Blocksworld and the real-world apple manipulation
environments that the SECURE agent, which solves such rearrangements under unawareness, is more data-efficient than agents that do not engage in
embodied conversation or semantic analysis.
Introduction
Figure 1: Comparison between grounding DINO predictions and ground-truth domain model.
In real-world scenarios, the robot has to solve tasks under unawareness due to uncertainty and false beliefs about the structure and parameters of the domain model (see Figure 1)
Embodied conversation allows one to cope with unawareness by enabling interactive symbol grounding
We propose an interactive task learning framework that processes embodied conversation using semantic analysis to make robot semantics-aware
Background: Semantic Analysis
Formal semantic analysis allows to interpret embodied conversation messages and their logical consequences
Sentence-level analysis: "Put the two granny smiths inside a basket" entails that "there are only two granny smiths"
Discourse-level analysis: correction on pick with message "No. This is golden delicious" entails that the picked object "is not granny smith"
Framework Overview
Agent's belief state contains domain theory build over the course of embodied conversation and examplars of observation-symbol pairs from experience.
Dialogue strategy measures the value of asking certain questions to the teacher.
Query value is measured using expected information gain: \(I(b,a) = H(b) - \mathbb{E}_{\phi \sim \mathrm{Result}(a)}[H(\mathrm{Update}(b,\phi))]\)
It is included in the value function in weigthing query value and extected reward that includes the cost of wrong prediction: \(Q(b,a)=\theta_1I(b,a)+\theta_2\mathbb{E}_{b}[R(a)]\)
Belief Update Examples
Figure 2: human-robot interaction using embodied conversation to ask a question to reduce the uncertainty about the domain or to processes user's corrective feedback in case of a wrong actions
Experiments and Results
We evaluate different agents in the simulated blocks domain and a real-world fruit domain, in which agents start unaware of the domain-level concepts and thought interaction learns to ground newly discovered concepts
Engaging in embodied conversation and processing it using formal semantic analysis has compounding benefits for bootstrapping interactive task learning
Semantics-aware agents can cope with false initial beliefs and revise them using the evidence acquired from extended interaction in the domain
Citation
@misc{secure2025, title={SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning}, author={Rimvydas Rubavicius and Peter David Fagan and Alex Lascarides and Subramanian Ramamoorthy}, year={2025}, eprint={2409.17755}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2409.17755}, }