user21067592
user21067592

Reputation: 11

AWS State Machine that creates a document classifier endpoint only working with < a certain number of documents

I really appreciate anyone being willing to help. I’ve been following along with this AWS blog post:

https://aws.amazon.com/blogs/machine-learning/intelligently-split-multi-form-document-packages-with-amazon-textract-and-amazon-comprehend/ -

and the github - https://github.com/aws-samples/aws-document-classifier-and-splitter/

I am struggling to get workflow1 to be compatible with my use case. When I give an S3 Location that contains < a certain number of training documents, my state machine loops between check table status -> is table complete -> wait for object processing. It is able to create the endpoint when I pass far fewer multipage pdf’s.

In function 4, the check to see if the table is full is

if rows_not_filled == 0:
        return True
    else:
        return False

Even when the table was full (I was able to look through the table for any empty rows), it seemed to be returning False. So I think I am being throttled somewhere by DynamoDB, but I don't know where, and I don't know how to fix this issue. I believe I have tried both on-demand as well as provisioned capacity options for the table, but get the same error when I try to feed too many documents to the application. I am pleading for someone to help me troubleshoot where the error is. I am almost positive that the issue lies with DynamoDB, as Textract has no issues successfully writing all document text to the table. I simply cannot get the table check to True, and since I am a novice with AWS, do not know what I need to modify in order to fix this. I have been stuck for weeks and AWS support has been zero help to me.

Upvotes: 0

Views: 30

Answers (0)

Related Questions