Reputation: 21
I found several older questions on if and how to do OCR by using Cognitive Services. On Cognitive services I can find a step-by-step description which tells me that and how OCR from PDF can be done. When I do it like the example memntioned at the bottom of the page, I still get that unsuppertdMediaType result
{ "code": "UnsupportedMediaType", "requestId": "c427e1c7-3f99-4a74-a36f-1620e68e3b64", "message": "Supported media types: application/octet-stream, multipart/form-data or application/json" }
When I change the PDF to an image everything is fine. I currently follow cognitive-services but while the request seems to be fine, the document type still is unsupported. I call:
https://.cognitiveservices.azure.com/vision/v2.0/ocr?language=de&detectOrientation=true&Ocp-Apim-Subscription-Key=&Content-Type=application/octet-stream
and the file is contained in the body, of course.
I don't post the C# or PowerShell, as the problem indeed seems to be with my request from the URL mentioned above.
Can someone please help me understand how to get a valid request to get text from a PDF with Azure ComputerVision?
Upvotes: 2
Views: 3498
Reputation: 4113
You are getting this error because OCR doesn't support PDF as per the docs
The OCR API works on images that meet the following requirements:
- The image must be presented in JPEG, PNG, GIF, or BMP format.
- The size of the input image must be between 50 x 50 and 4200 x 4200 pixels.
- The text in the image can be rotated by any multiple of 90 degrees plus a small angle of up to 40 degrees.
That being said, you can use the new Read API as it supports PDF as per the docs
The Read API works with images that meet the following requirements:
- The image must be presented in JPEG, PNG, BMP, PDF, or TIFF format.
- The dimensions of the image must be between 50 x 50 and 10000 x 10000 pixels. PDF pages must be 17 x 17 inches or smaller.
- The file size of the image must be less than 20 megabytes (MB).
It is guaranteed to work if you follow the requirements and use the right endpoint!
Upvotes: 3