Three reasons why robots are about to enter the "ChatGPT moment."

2024-07-23

Since the inception of robotics, practitioners in the field have always aspired to create robots capable of performing various household chores. However, for a long time, this remained an elusive dream.

Although roboticists have been able to make robots perform some impressive feats in laboratories, such as parkour, these tasks typically require meticulous planning in a strictly controlled environment.

This makes it difficult for robots to work reliably at home, especially in households with children and pets. Moreover, each house is constructed differently, and there are various chaotic situations that can arise.

There is a famous observation in the field of robotics known as Moravec's Paradox: what is difficult for humans is easy for machines, and what is easy for humans is hard for robots.

Now, with artificial intelligence, this situation is changing. Robots are beginning to be able to perform tasks such as folding clothes and cooking, which not long ago were considered almost impossible to accomplish.In the cover story of the latest issue of "MIT Technology Review," I explored how the field of robotics is reaching its turning point.

Advertisement

The field of robotics research has seen an incredibly exciting convergence of technologies, which may (just may) allow robots to step out of the laboratory and into our homes.

Here are three reasons why robotics is about to experience its "ChatGPT moment."

---

Affordable hardware makes research easier to carry out

Robots are expensive. Highly sophisticated robots start at hundreds of thousands of dollars, making them unaffordable for most researchers. For instance, the first batch of home robots, PR2, weighing 440 pounds, was priced at $400,000.But new, cheaper robots allow more researchers to do some cool things. A startup called Hello Robot has developed and launched a new robot named Stretch, priced at about $18,000, weighing about 22.6 kilograms.

It has a small mobile base, a pole with a camera hanging on it, an adjustable arm, and at the end, there is a suction cup that can be controlled with a controller.

Meanwhile, a team from Stanford University in the United States has built a system called Mobile ALOHA (an acronym for "Affordable Low-cost Open-source Hardware for Autonomous Robotic Operations"), which has learned to cook shrimp relying only on data from 20 human demonstrations and other tasks.

They have cobbled together a cheaper robot using off-the-shelf components, priced in tens of thousands of dollars, instead of hundreds of thousands.Artificial Intelligence is helping us build a "Robot Brain"

The software of these new robots is different from that of the past. Due to the rapid development of artificial intelligence, the current research focus is shifting from making expensive robots more flexible to building a "universal robot brain" in the form of neural networks.

Roboticists have already begun to use deep learning and neural networks to create systems that practice and learn in the environment, adjusting their behavior accordingly, rather than traditional planning and training.

In the summer of 2023, Google introduced a visual language action model called RT-2. This model gains a general understanding of the world through online text and images, as well as its own interactions. It translates these data into robot actions.

Researchers from the Toyota Research Institute, Columbia University, and the Massachusetts Institute of Technology have been able to quickly teach robots to perform many new tasks with the help of artificial intelligence learning techniques and generative artificial intelligence, known as imitation learning.They believe they have found a method that will propel generative artificial intelligence technology from the realms of text, images, and videos to the field of robotic motion.

Many are attempting to harness generative artificial intelligence. Covariant is a robotics startup spun off from OpenAI's now-defunct robotics research division, and it has developed a multimodal model called the RFM-1.

It can accept prompts in the form of text, images, videos, robotic instructions, or measurements (data). Generative artificial intelligence enables robots to both understand instructions and generate images or videos related to these tasks.

More data, more skills.The formidable capabilities of large artificial intelligence models like GPT-4 stem from the vast amount of data collected from the internet. However, this does not apply to robots, as they require data specifically collected for robots.

They need demonstration data on how to operate washing machines and refrigerators, as well as how to pick up plates, how to fold clothes, and so on. Currently, such data is very scarce, and it takes humans a long time to collect it.

Google DeepMind has initiated a new initiative called "Open X Avatar Collaboration" aimed at changing this situation.

In 2023, the company collaborated with 34 research laboratories and approximately 150 researchers to collect data from 22 different robots, including Hello Robot's Stretch robot.

The resulting dataset, released in October 2023, showcases 527 skills of the robots, such as picking up objects, pushing, and moving.Early indications suggest that more data is giving rise to smarter robots. Researchers have constructed two versions of a model for robots, called RT-X, which can run locally on computers in various laboratories or be accessed via the internet.

The larger, internet-accessible model is pre-trained with internet data to develop "visual common sense," or a basic understanding of the world, from large language and image models.

When researchers ran the RT-X model on many different robots, they found that the success rate of these robots learning skills was 50% higher than the systems being developed in each laboratory.

Comment