At Google I/O 2024, Google CEO Sundar Pichai introduced his new multimodal AI agent, Project Astra. This AI tool is capable of answering users' questions in real-time and can engage in conversation through text, audio, or video. Google showcased a demo video of this new AI agent at its developer's conference. Project Astra is set to directly compete with Open AI's GPT-4o multimodal.
How does Project Astra work?
The special feature of Google's new AI agent is its ability to access information about everything in the room. By granting it camera access, it can provide information about objects within its view. Additionally, if you point at something visible in the camera and ask a question, it will provide relevant information.
Google has shared a demo video of this tool through its GoogleDeepMinds X (formerly Twitter) handle, demonstrating its capabilities in identifying items within a room. Project Astra can be accessed through smartphones and smart glasses and is integrated with Google Lens, based on Google Gemini AI.
Google claims that Project Astra has the ability to rapidly process any information, including encoding video frames and providing related information. Several changes have been made to the new AI Assistant to make it appear more natural.
Finding lost items easily
This AI assistant from Google is similar to OpenAI's GPT-4o, providing information about objects through pictures, videos, and text messages. This was evident in the demo shared by GoogleDeepMind, with additional features included.
With Project Astra, users can ask questions by drawing lines on the screen. It also remembers everything seen in the video, enabling it to provide information when asked about previously captured items. If you forget something in the room and Project Astra has captured it on camera, it can help you locate the item. However, its memory is limited to a short duration of time.
ALSO READ: Android 15 Beta 2 to release tonight with AI advancements