Reputation: 82
We Plan to use AWS Textract service for document analysis. presently result coming in bounding boxes format. anyone know how to get exact pdf layout with this service?
OCR Pdf document text Extraction for document Analysis
jobId = startJob(s3BucketName, documentName)
print("Started job with id: {}".format(jobId))
if(isJobComplete(jobId)):
response = getJobResults(jobId)
#print(response)
# Print detected text
for resultPage in response:
for item in resultPage["Blocks"]:
if item["BlockType"] == "LINE":
print ('\033[94m' + item["Text"] + '\033[0m')
Upvotes: 1
Views: 368