Jagadeesh Katla
Jagadeesh Katla

Reputation: 82

How i get OCR PDF layout with AWS textract API..?

We Plan to use AWS Textract service for document analysis. presently result coming in bounding boxes format. anyone know how to get exact pdf layout with this service?

OCR Pdf document text Extraction for document Analysis

jobId = startJob(s3BucketName, documentName)
print("Started job with id: {}".format(jobId))
if(isJobComplete(jobId)):
    response = getJobResults(jobId)

#print(response)

# Print detected text
for resultPage in response:
    for item in resultPage["Blocks"]:
        if item["BlockType"] == "LINE":
            print ('\033[94m' +  item["Text"] + '\033[0m')

Upvotes: 1

Views: 368

Answers (0)

Related Questions