Dolev Shmueli
Dolev Shmueli

Reputation: 1

integrate ChatGPT API to project to detect objects in image

I am encountering a problem with ChatGPT API to detect objects in an image, I am not sure if this API is incapable of doing so. I tried to check OpenAI docs and it seems like he is able of doing so : https://platform.openai.com/docs/guides/vision?lang=node

this is the following response I get from API :

console.log response from API: I'm sorry but as an AI text-based model, I don't have the capability to see or analyze images. Could you provide a textual description of the items?

Does someone could clarify it to me?

I expected to get a list of items' names and their quantity to use for Edamam API.

This is the following code:

// openAI.test.ts
import env from "dotenv";
// import path from "path";

import { fetchChatCompletion, Message } from "../../services/openAI";
env.config();
// const filePath = path.join(__dirname, "../public/images.jpeg");
const imageURL =
  "https://www.diabetesfoodhub.org/system/user_files/Images/1837-diabetic-pecan-crusted-chicken-breast_JulAug20DF_clean-simple_061720.jpg";
// console.log("imageURL", filePath);
const message: Message[] = [
  {
    role: "user",
    content: [
      {
        type: "text",
        text: "Hello, Could you please provide me then name of each item and the quantity of each item in the image for using it in Edamam API?",
      },
      {
        type: "image_url",
        image_url: { url: imageURL },
      },
    ],
  },
];

describe("fetchChatCompletion", () => {
  test("fetches chat completion successfully", async () => {
    try {
      const response = await fetchChatCompletion(message);
      expect(response).toBeDefined();
      console.log("response from API: \n", response);
    } catch (error) {
      console.error("API Error: ", error.message);
    }
  }, 30000); // Increase timeout to 30 seconds
});


// openAI.ts
import OpenAI from "openai";
import dotenv from "dotenv";
// import readline from "readline";

dotenv.config();

export interface Message {
  role: "user" | "assistant";
  content: [
    { type: "text"; text: string },
    { type: "image_url"; image_url: object }
  ];
}

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function fetchChatCompletion(messages: Message[]): Promise<string> {
  console.log("messages: \n", messages);

  const imageUrl = messages[0].content[1].image_url;
  console.log("imageUrl: ", imageUrl);

  try {
    const response = await openai.chat.completions.create({
      model: "gpt-4", // Ensure the model name is correctly specified
      messages: messages.map((msg) => ({
        role: msg.role,
        content: msg.content
          .map((contentItem) => {
            if (contentItem.type === "text") {
              return contentItem.text;
            } else if (contentItem.type === "image_url") {
              // Handle image URL as text, since ChatGPT can't process images
              return `Image URL: ${contentItem.image_url}`;
            }
            return ""; // Fallback for unknown content types
          })
          .join(" "), // Combine text and image URL descriptions into a single string
      })),
    });

    const latestResponse = response.choices[0].message.content;
    // console.log("latestResponse", latestResponse);
    return latestResponse;
  } catch (error) {
    console.error("API Error: ", error.message);
    return error.message;
  }
}
export { fetchChatCompletion };

Upvotes: 0

Views: 439

Answers (1)

Remo H. Jansen
Remo H. Jansen

Reputation: 25009

The format of the content that you are passing:

       content: msg.content
          .map((contentItem) => {
            if (contentItem.type === "text") {
              return contentItem.text;
            } else if (contentItem.type === "image_url") {
              // Handle image URL as text, since ChatGPT can't process images
              return `Image URL: ${contentItem.image_url}`;
            }
            return ""; // Fallback for unknown content types
          })
          .join(" "), // Combine text and image URL descriptions into a single string

Is different from the one in the documentation:

        content: [
          { type: "text", text: "What’s in this image?" },
          {
            type: "image_url",
            image_url: {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            },
          },
        ],

The model is also different (gpt-4 vs gpt-4o). The docs mention the following:

Both GPT-4o and GPT-4 Turbo have vision capabilities, meaning the models can take in images and answer questions about them. Historically, language model systems have been limited by taking in a single input modality, text.

Upvotes: 0

Related Questions