Google Gemini Image Understanding Component

Details

Asset Version: 1.0.0

Last Published: Nov 28, 2025

The Google Gemini Image Understanding Component enables seamless interaction with Google’s multimodal AI capabilities to intelligently analyze and interpret image-based content. This component allows you to upload images (JPEG, PNG..) from either the device camera or the gallery, along with an optional text prompt. The AI model then generates responses based on the visual elements, objects, text, and contextual details present in the image.

Requirements

HCL Volt MX Iris
HCL Volt MX Foundry

Devices

Platforms

Documentation

Features:

Full-Context Visual Analysis : Understands entire images holistically — including layout, composition, and spatial relationships — to derive context-aware insights beyond basic object or text detection.
Multi-Modal Intelligence : Processes both visual and textual elements such as charts, diagrams, graphs, infographics, and handwritten content, enabling deeper semantic interpretation.
Structured Output Generation : Delivers results in JSON, plain text, suitable for integration into analytics, automation, or reporting systems.

Privacy Statement