How i get OCR PDF layout with AWS textract API..?

Question

We Plan to use AWS Textract service for document analysis. presently result coming in bounding boxes format. anyone know how to get exact pdf layout with this service?

OCR Pdf document text Extraction for document Analysis

jobId = startJob(s3BucketName, documentName)
print("Started job with id: {}".format(jobId))
if(isJobComplete(jobId)):
    response = getJobResults(jobId)

#print(response)

# Print detected text
for resultPage in response:
    for item in resultPage["Blocks"]:
        if item["BlockType"] == "LINE":
            print ('\033[94m' +  item["Text"] + '\033[0m')

How i get OCR PDF layout with AWS textract API..?

Answers (0)

Related Questions