Quadruped robots integrating manipulators could potentially tackle tasks that entail manipulating objects while swiftly moving around in their surrounding environment. These include tasks such as collecting the trash around the house, collecting specific objects and bringing them to humans or depositing target items at specific locations.
Many approaches designed to train robots to successfully tackle these tasks rely on imitation learning. This means that the algorithms planning the robots’ actions learn policies that would allow the robot to complete a task by processing demonstration data showing how agents tackled this task.
While some existing methods to train robots on tasks involving both locomotion and object manipulation achieved promising results in simulations, they often do not perform as well “in the wild.” This essentially means that they do not allow robots to generalize well across various tasks when tested in real-world environments.
Researchers at UC San Diego recently introduced WildLMa, a new framework that could improve the long horizon loco-manipulation skills of quadruped robots in the wild. This framework, outlined in a paper on the arXiv preprint server, has three components that can collectively boost the generalizability of skills learned via imitation learning.
“The rapid progress in imitation learning has enabled robots to learn from human demonstrations,” Yuchen Song, an author of the paper, told Tech Xplore.
“However, these systems often focus on isolated, specific skills and they struggle to adapt to new environments. Our work aims to overcome this limitation by training robots to acquire generalizable skills using Vision-Language Models (VLMs) and then leveraging Large Language Models (LLMs) to chain these skills into sequences that allow the robots to tackle complex tasks.”
WildLMa, the framework devised by Song and his colleagues, firstly provides a simple way to collect expert demonstration data. This is achieved via a virtual reality (VR)-based teleoperation system, in which human agents can leverage pre-trained robot control algorithms and use only a single hand to control the robot’s whole body movements.
“These pretrained skills are then enhanced by LLMs, which break down complex tasks into manageable steps—similar to how a human might approach a challenge (e.g., ‘pick—navigate—place’),” explained Song. “The result is a robot capable of executing long, multi-step tasks efficiently and intuitively.”
A characterizing feature of the approach introduced by this team of researchers is that it also integrates attention mechanisms. These mechanisms allow robots to focus on a target object while they are completing specific tasks.
“The integration of attention mechanisms plays a critical role in making the robot’s skills more adaptable and generalizable,” said Song. “WildLMa’s potential applications include practical household chores, such as tidying up or retrieving items. We’ve already demonstrated some of these capabilities.”
Song and his colleagues had already demonstrated the potential of their framework in a series of real-world experiments, where they successfully trained a four-legged robot to complete a variety of tasks. These tasks included cleaning up the trash in hallways and outdoor spaces at UC San Diego, picking up food deliveries, and re-arranging items on a bookshelf.
“While our system performs well, it can still be affected by unexpected disturbances, such as people moving around,” added Song. “Our next steps will involve making the system more robust in dynamic environments. Ultimately, we aim to create home assistant robots that are affordable and accessible to everyone.”
Read more at Phys.org