I have tried to use Google Vision API Text detection feature and the web demo of Google to OCR my image. Two results is not same. Firstly, i tried it with demo at url, https://cloud.google.com/vision/docs/drag-and-drop . Finally, i tried it with google api code by python language. Two results is not same and i don't know why . Could you please help me this problem? My image: http://dfp.crawl.kyanon.digital/crawled_images/m.vta/1931/m.vta-home-slidebanner-image/2/assets/100000_samsung-galaxy-m20.png My api result: "SAMSUNG Galaxy M20Siêu Pin vô doi, sac nhanh tuc thiMoiSAMSUNG4.990.000dTrà gop 0%Mua ngay" My web demo result: https://imge.to/i/q4gRw Thank you very much my python code here: client = vision.ImageAnnotatorClient() raw_byte = cv2.imencode('.jpg', image)[1].tostring() post_image = types.Image(content=raw_byte) image_context = vision.types.ImageContext() response = client.text_detection(image=post_image, image_context=image_context)

pythongoogle-cloud-platformgoogle-cloud-functionsgoogle-cloud-vision

Reputation: 1

Google text detection api - Web demo result is different from using api

I have tried to use Google Vision API Text detection feature and the web demo of Google to OCR my image. Two results is not same.

Firstly, i tried it with demo at url, https://cloud.google.com/vision/docs/drag-and-drop. Finally, i tried it with google api code by python language. Two results is not same and i don't know why . Could you please help me this problem?

My image: http://dfp.crawl.kyanon.digital/crawled_images/m.vta/1931/m.vta-home-slidebanner-image/2/assets/100000_samsung-galaxy-m20.png
My api result: "SAMSUNG Galaxy M20Siêu Pin vô doi, sac nhanh tuc thiMoiSAMSUNG4.990.000dTrà gop 0%Mua ngay"
My web demo result: https://imge.to/i/q4gRw Thank you very much

my python code here:

client = vision.ImageAnnotatorClient()
raw_byte = cv2.imencode('.jpg', image)[1].tostring()
post_image = types.Image(content=raw_byte)
image_context = vision.types.ImageContext()
response = client.text_detection(image=post_image, image_context=image_context)

Upvotes: 0

Answers (2)

Dabbel

Reputation: 2825

This is Typescript code.

But the idea is not to use text_detection but something like document_text_detection (unsure what the python API specifically provides).

Using documentTextDetection() instead of textDetection() solved the exact same problem for me.

const fs = require("fs");
const path = require("path");
const vision = require("@google-cloud/vision");

async function quickstart() {
  let text = '';
  const fileName = "j056vt-_800w_800h_sb.jpg";
  const imageFile = fs.readFileSync(fileName);
  const image = Buffer.from(imageFile).toString("base64");
  const client = new vision.ImageAnnotatorClient();

  const request = {
    image: {
      content: image
    },
    imageContext: {
      languageHints: ["vi-VN"]
    }
  };

  const [result] = await client.documentTextDetection(request);

  // OUTPUT METHOD A

  for (const tmp of result.textAnnotations) {
      text += tmp.description + "\n";
  }

  console.log(text);

  const out = path.basename(fileName, path.extname(fileName)) + ".txt";
  fs.writeFileSync(out, text);

  // OUTPUT METHOD B

  const fullTextAnnotation = result.fullTextAnnotation;
  console.log(`Full text: ${fullTextAnnotation.text}`);
  fullTextAnnotation.pages.forEach(page => {
    page.blocks.forEach(block => {
      console.log(`Block confidence: ${block.confidence}`);
      block.paragraphs.forEach(paragraph => {
        console.log(`Paragraph confidence: ${paragraph.confidence}`);
        paragraph.words.forEach(word => {
          const wordText = word.symbols.map(s => s.text).join("");
          console.log(`Word text: ${wordText}`);
          console.log(`Word confidence: ${word.confidence}`);
          word.symbols.forEach(symbol => {
            console.log(`Symbol text: ${symbol.text}`);
            console.log(`Symbol confidence: ${symbol.confidence}`);
          });
        });
      });
    });
  });

}

quickstart();

Upvotes: 1

Patrick_Weber

Reputation: 120

Actually, comparing both of your results, the only difference I see is the way the results are displayed. The Google Cloud Drag and Drop site displays the results with the bounding boxes and tries to find areas of text.

The response you get with your python script includes the same information. A few examples:

texts = response.text_annotations
print([i.description for i in texts])
# prints all the words that were found in the image

print([i.bounding_poly.vertices for i in texts])
# prints all boxes around detected words

Feel free to ask more questions for clarification.

A few other thoughts:

Are you preprocessing your images?
What size are the images?

Upvotes: 0

Google text detection api - Web demo result is different from using api

Answers (2)

Related Questions