SECURE Overview

SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning

Interactive task learning framework to cope with unforeseen possibilities by exploiting
the formal semantic analysis of embodied conversation


Rimvydas Rubavicius, Peter David Fagan, Alex Lascarides, Subramanian Ramamoorthy

Centre for AI in Assistive Autonomy

University of Edinburgh

Abstract: This paper addresses a challenging interactive task learning scenario we call rearrangement under unawareness: an agent must manipulate a rigid-body environment without knowing a key concept necessary for solving the task and must learn about it during deployment. For example, the user may ask to "put the two granny smith apples inside the basket", but the agent cannot correctly identify which objects in the environment are "granny smith" as the agent has not been exposed to such a concept before. We introduce SECURE, an interactive task learning policy designed to tackle such scenarios. The unique feature of SECURE is its ability to enable agents to engage in semantic analysis when processing embodied conversations and making decisions. Through embodied conversation, a SECURE agent adjusts its deficient domain model by engaging in dialogue to identify and learn about previously unforeseen possibilities. The SECURE agent learns from the user's embodied corrective feedback when mistakes are made and strategically engages in dialogue to uncover useful information about novel concepts relevant to the task. These capabilities enable the SECURE agent to generalize to new tasks with the acquired knowledge. We demonstrate in the simulated Blocksworld and the real-world apple manipulation environments that the SECURE agent, which solves such rearrangements under unawareness, is more data-efficient than agents that do not engage in embodied conversation or semantic analysis.

Introduction

Grounding DINO Predictions
Figure 1: Comparison between grounding DINO predictions and ground-truth domain model.

Background: Semantic Analysis

  • Formal semantic analysis allows to interpret embodied conversation messages and their logical consequences
  • Sentence-level analysis: "Put the two granny smiths inside a basket" entails that "there are only two granny smiths"
  • Discourse-level analysis: correction on pick with message "No. This is golden delicious" entails that the picked object "is not granny smith"
Pointing Iliustation

Framework Overview

  • Agent's belief state contains domain theory build over the course of embodied conversation and examplars of observation-symbol pairs from experience.
  • Dialogue strategy measures the value of asking certain questions to the teacher.
  • Query value is measured using expected information gain: \(I(b,a) = H(b) - \mathbb{E}_{\phi \sim \mathrm{Result}(a)}[H(\mathrm{Update}(b,\phi))]\)
  • It is included in the value function in weigthing query value and extected reward that includes the cost of wrong prediction: \(Q(b,a)=\theta_1I(b,a)+\theta_2\mathbb{E}_{b}[R(a)]\)
Task Iliustation

Belief Update Examples


Figure 2: human-robot interaction using embodied conversation to ask a question to reduce the uncertainty
about the domain or to processes user's corrective feedback in case of a wrong actions

Experiments and Results

Citation

@misc{secure2025,
title={SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning},
author={Rimvydas Rubavicius and Peter David Fagan and Alex Lascarides and Subramanian Ramamoorthy},
year={2025},
eprint={2409.17755},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2409.17755},
}