Published by dugufeng
Author: dugufeng
Tags: language:english, vision, multimodal, qwen-vl, siliconflow
This is a straightforward multimodal workflow. It receives an uploaded image and uses the Qwen2.5-VL-32B-Instruct model to return a concise description of the image in English.
v1.9.0+ (Please fill in the Dify version you have tested)1. SiliconFlow API Key:
langgenius/siliconflow provider.imageUrl variable, upload an image (or provide a URL).text output (the image description).file type input named imageUrl.Qwen/Qwen2.5-VL-32B-Instruct model with vision enabled.imageUrl as context.text (string) description.

vision/image-recognition-en