Google text detection api - Web demo result is different from using api

I have tried to use Google Vision API Text detection feature and the web demo of Google to OCR my image. Two results is not same.

Firstly, i tried it with demo at url, https://cloud.google.com/vision/docs/drag-and-drop. Finally, i tried it with google api code by python language. Two results is not same and i don't know why . Could you please help me this problem?

my python code here:

client = vision.ImageAnnotatorClient()
raw_byte = cv2.imencode('.jpg', image)[1].tostring()
post_image = types.Image(content=raw_byte)
image_context = vision.types.ImageContext()
response = client.text_detection(image=post_image, image_context=image_context)

Upvotes: 0

Views: 497

Answers (2)

Dabbel
Dabbel

Reputation: 2825

This is Typescript code.

But the idea is not to use text_detection but something like document_text_detection (unsure what the python API specifically provides).

Using documentTextDetection() instead of textDetection() solved the exact same problem for me.

const fs = require("fs");
const path = require("path");
const vision = require("@google-cloud/vision");

async function quickstart() {
  let text = '';
  const fileName = "j056vt-_800w_800h_sb.jpg";
  const imageFile = fs.readFileSync(fileName);
  const image = Buffer.from(imageFile).toString("base64");
  const client = new vision.ImageAnnotatorClient();

  const request = {
    image: {
      content: image
    },
    imageContext: {
      languageHints: ["vi-VN"]
    }
  };

  const [result] = await client.documentTextDetection(request);

  // OUTPUT METHOD A

  for (const tmp of result.textAnnotations) {
      text += tmp.description + "\n";
  }

  console.log(text);

  const out = path.basename(fileName, path.extname(fileName)) + ".txt";
  fs.writeFileSync(out, text);

  // OUTPUT METHOD B

  const fullTextAnnotation = result.fullTextAnnotation;
  console.log(`Full text: ${fullTextAnnotation.text}`);
  fullTextAnnotation.pages.forEach(page => {
    page.blocks.forEach(block => {
      console.log(`Block confidence: ${block.confidence}`);
      block.paragraphs.forEach(paragraph => {
        console.log(`Paragraph confidence: ${paragraph.confidence}`);
        paragraph.words.forEach(word => {
          const wordText = word.symbols.map(s => s.text).join("");
          console.log(`Word text: ${wordText}`);
          console.log(`Word confidence: ${word.confidence}`);
          word.symbols.forEach(symbol => {
            console.log(`Symbol text: ${symbol.text}`);
            console.log(`Symbol confidence: ${symbol.confidence}`);
          });
        });
      });
    });
  });

}

quickstart();

Upvotes: 1

Patrick_Weber
Patrick_Weber

Reputation: 120

Actually, comparing both of your results, the only difference I see is the way the results are displayed. The Google Cloud Drag and Drop site displays the results with the bounding boxes and tries to find areas of text.

The response you get with your python script includes the same information. A few examples:

texts = response.text_annotations
print([i.description for i in texts])
# prints all the words that were found in the image

print([i.bounding_poly.vertices for i in texts])
# prints all boxes around detected words

Feel free to ask more questions for clarification.

A few other thoughts:

  • Are you preprocessing your images?
  • What size are the images?

Upvotes: 0

Related Questions