The core role of
cameras in general humanoid robots: Visual perception and understanding
Cameras serve as the primary interface for general humanoid robots to acquire visual information from the external world. By capturing images of the surrounding environment and converting them digital signals, they provide fundamental data for subsequent visual analysis and processing. For instance, in logistics and warehousing scenarios, humanoid robots use cameras to capture the position,, color, and identification features of goods, enabling precise recognition and, combined with path planning algorithms, efficient sorting and storage of goods. The "Qinglong", unveiled at the 2024 World Artificial Intelligence Conference, boasts 43 joints and flexible movements. It senses its surroundings through cameras, allowing it to pick up small objects as small as 2 centimeters and even use tools to pick out sesame seeds from a pile of millet.
Target recognition and tracking
With aid of computer vision technology, general humanoid robots can perform in-depth analysis of image data captured by cameras, recognizing different target objects such as people, vehicles, and specific, and continuously tracking these targets. In the field of security monitoring, humanoid robot cameras can monitor the movement trajectories of individuals in real-time. Once abnormal behaviors such asitering, running, or trespassing into restricted areas are detected, alarms are immediately triggered, providing strong support for safety prevention. Humanoid robots that provide reception and services in hotels and shopping malls, equipped with high-performance cameras, can respond efficiently to customer needs in real-time.
Navigation and obstacle avoidance assistance
For mobile general robots, cameras are one of the core sensors for achieving autonomous navigation and obstacle avoidance. Through real-time analysis of images of the surrounding environment, robots can identify key information as roads, obstacles, and landmarks. Combined with data from other sensors such as lidars and ultrasonic sensors, they can perform multi-source information fusion to plan safe efficient travel paths and avoid collisions with obstacles. The Unitree H1 humanoid robot from Unitree Technology, equipped with the Intel RealSense Depth Camera D43i and the DJI Livox Mid-360 lidar module, adopts 360° panoramic depth perception technology, allowing it to precisely grasp every of the surrounding environment. It can move at a speed of up to 3.3 meters per second and can also start, run, and jump flexibly in-world environments.
Enhanced human-machine interaction
In terms of human-machine interaction, cameras help general humanoid robots recognize and understand human facial expressions, hand gestures, and more achieving more natural and smooth interactions. Through facial recognition technology, robots can identify different users and provide personalized services based on the user's identity and preferences. The gesture recognition allows users to control the robot with simple hand movements, without the need for complex operation interfaces. In the field of education, humanoid robots can capture students' expressions and through cameras, understand their learning status and emotions, and adjust teaching methods in a timely manner to improve teaching effectiveness.