Anton
Anton

Reputation: 127

Azure OCR difference between demo page and console

I have several examples of images I need to recognize with OCR.

I've tried to recognize them on the demo page https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/ and it works quite well. I use the "Read text in images" option, which works even better than "Read handwritten text from images".

But when I try to use the REST call from a script (according to the example given in documentation) results are much worse. Some letters are recognized wrong, some are totally missed. If I try running the same example from the development console https://westcentralus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fc/console I still get the same bad results.

What can cause this difference? How can I fix it to get reliable results as the demo page produces?

Maybe any additional information is required?

UPD: since I couldn't find any solution or even explanation of the difference I've created a sample file (similar to actual files) so you can have a look. The file url is http://sfiles.herokuapp.com/sample.png

You can see, if it is used on the demo page https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/ in section "Read text in images" the resulting JSON is

{
  "status": "Succeeded",
  "succeeded": true,
  "failed": false,
  "finished": true,
  "recognitionResult": {
    "lines": [
      {
        "boundingBox": [
          307,
          159,
          385,
          158,
          386,
          173,
          308,
          174
        ],
        "text": "October 2011",
        "words": [
          {
            "boundingBox": [
              308,
              160,
              357,
              160,
              357,
              174,
              308,
              175
            ],
            "text": "October"
          },
          {
            "boundingBox": [
              357,
              160,
              387,
              159,
              387,
              174,
              357,
              174
            ],
            "text": "2011"
          }
        ]
      },
      {
        "boundingBox": [
          426,
          157,
          519,
          158,
          519,
          173,
          425,
          172
        ],
        "text": "07UC14PII0244",
        "words": [
          {
            "boundingBox": [
              426,
              160,
              520,
              159,
              520,
              174,
              426,
              174
            ],
            "text": "07UC14PII0244"
          }
        ]
      }
    ]
  }
}

If I use this file in the console and make the following call:

POST https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=unk&detectOrientation =true HTTP/1.1
Host: westcentralus.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••

{"url":"http://sfiles.herokuapp.com/sample.png"}

I get different result:

{
  "language": "el",
  "textAngle": 0.0,
  "orientation": "Up",
  "regions": [{
    "boundingBox": "309,161,75,10",
    "lines": [{
      "boundingBox": "309,161,75,10",
      "words": [{
        "boundingBox": "309,161,46,10",
        "text": "October"
      }, {
        "boundingBox": "358,162,26,9",
        "text": "2011"
      }]
    }]
  }, {
    "boundingBox": "428,161,92,10",
    "lines": [{
      "boundingBox": "428,161,92,10",
      "words": [{
        "boundingBox": "428,161,92,10",
        "text": "071_lC14P110244"
      }]
    }]
  }]
}

As you see the result is totally different (even the JSON format). Does anyone know what am I doing wrong, or maybe I'm missing something, and the "Read text in images" demo does not match the ocr method of the API?

Will be very grateful for any help.

Upvotes: 1

Views: 1725

Answers (1)

cthrash
cthrash

Reputation: 2973

There are two flavors of OCR in Microsoft Cognitive Services. The newer endpoint (/recognizeText) has better recognition capabilities, but currently only supports English. The older endpoint (/ocr) has broader language coverage.

Some additional details about the differences are in this post.

Upvotes: 1

Related Questions