Photo by Andy Kelly on Unsplash

Semantic Policies

As collaborative robots become increasingly available, methodologies and tools are needed that allow them to expand their repertoire of interaction skills. Programming such skills by hand is a challenging endeavor since it requires anticipating and a-priori reasoning about the situations that may occur. While imitation learning can be used to facilitate this process, there are many important aspects of a collaborative task that cannot be communicated through behavioral demonstrations only, e.g., the individual segments of the task, the semantic type of behavior executed, or the name of the target object. Indeed, human teachers and coaches often use a combination of motion and language to convey a variety of information to a student. Consequently, novel imitation learning approaches are needed that leverage both modalities.

This project aims to investigate how verbal instructions extracted from human speech can be used to segment and semantically annotate human demonstrations. Furthermore, we show that this information can be used to learn both (a) low-level interaction primitives, as well as (b) higher-level interaction networks that encode the transition model among primitives. As a result, few(er) demonstrations are necessary to learn both the motion and structure underlying the imitated task.
In addition, giving robots the ability to utilize speech also drastically increases the safety of the collaboration between humans and robots. For robots and intelligent machines to interact with a human partner, they need to be able to interpret and understand our intentions. To this end, robots need to be able to read our bodily expressions and movements, as well as our verbal requests and commands.