Reputation: 301
I'm trying to use boto3 to run a textract detect_document_text request.
I'm using the following code:
client = boto3.client('textract')
response = client.detect_document_text(
Document={
'Bytes': image_b64['document_b64']
}
)
Where image_b64['document_b64'] is a base64 image code that I converted using, for exemplo, https://base64.guru/converter/encode/image website.
But I'm getting the following error:
UnsupportedDocumentException
What I'm doing wrong?
Upvotes: 3
Views: 2352
Reputation: 1437
This worked for me. It assumes you have configured the ~/.aws with your aws credentials
import boto3
import os
def main():
client = boto3.client('textract', region_name="ca-central-1")
for imageFile in os.listdir('./img'):
image_file = f"./imgs/{imageFile}"
with open(image_file, "rb") as f:
response = client.analyze_expense(
Document={
'Bytes': f.read(),
'S3Object': {
'Bucket': 'REDACTED',
'Name': imageFile,
'Version': '1'
}
})
print(response)
if __name__ == "__main__":
main()
Upvotes: 0
Reputation: 111
With Boto3 if you are using Jupyternotebook for image (.jpg or .png), you can use:
import boto3
import cv2
with open(images_path, "rb") as img_file:
img_str = bytearray(img_file.read())
textract = boto3.client('textract')
response = textract.detect_document_text(Document={'Bytes': img_str})
Upvotes: 0
Reputation: 301
For future reference, I solved that problem using:
client = boto3.client('textract')
image_64_decode = base64.b64decode(image_b64['document_b64'])
bytes = bytearray(image_64_decode)
response = client.detect_document_text(
Document={
'Bytes': bytes
}
)
Upvotes: 2
Reputation: 5828
Per doc:
If you're using an AWS SDK to call Amazon Textract, you might not need to base64-encode image bytes passed using the Bytes field.
Base64-encoding is only required when directly invoking the REST API. When using Python or NodeJS SDK, use native bytes (binary bytes).
Upvotes: 1