Whizzil
Whizzil

Reputation: 1364

Not receiving a message to Amazon SNS from Textract

I am using Amazon Textract's StartDocumentAnalysis function to asynchronously scan a .pdf file from the S3 bucket. As the documentation says, I should receive a notification about the job status to the provided SNS topic.

StartDocumentAnalysis returns a job identifier (JobId) that you use to get the results of the operation. When text analysis is finished, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that you specify in NotificationChannel.

The code that I'm using to start the analysis looks like this:

    fun analyzeDocument(documentId: String) {
        klogger.info { "Start Textract analysis on document '$documentId'" }

        val request = StartDocumentAnalysisRequest()
            .withFeatureTypes("TABLES", "FORMS")
            .withDocumentLocation(DocumentLocation()
                .withS3Object(S3Object()
                    .withName(documentId)
                    .withBucket(bucketName)
                )
            )
            .withNotificationChannel(NotificationChannel()
                .withSNSTopicArn(snsTopicArn)
                .withRoleArn(snsRoleArn)
            )

        val jobId = textract.startDocumentAnalysis(request).jobId

        klogger.info { "Analysis started for document '$documentId'. Job ID: '$jobId'" }
    }

I have created the SNS in AWS console.

I am able to manually publish a message to that SNS from the console, but no message from Textract ever enters the SNS topic. I have waited for several hours already - I suspect by now I would have already received the message.

I am not sure if the snsRoleArn is correct. I just used some random one that I already had in AWS. Could this be a problem? Which snsRoleArn should I use? If not that, why am I not receiving a message?

Could I be missing something in the access policy?

{
  "Version": "2008-10-17",
  "Id": "__default_policy_ID",
  "Statement": [
    {
      "Sid": "__default_statement_ID",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": [
        "SNS:GetTopicAttributes",
        "SNS:SetTopicAttributes",
        "SNS:AddPermission",
        "SNS:RemovePermission",
        "SNS:DeleteTopic",
        "SNS:Subscribe",
        "SNS:ListSubscriptionsByTopic",
        "SNS:Publish",
        "SNS:Receive"
      ],
      "Resource": "arn:aws:sns:us-east-1:093475263507:textract-result.fifo",
      "Condition": {
        "StringEquals": {
          "AWS:SourceOwner": "093475263507"
        }
      }
    }
  ]
}

Upvotes: 1

Views: 2944

Answers (3)

Tony BenBrahim
Tony BenBrahim

Reputation: 7290

Another issue you can run into while testing is reusing the same client token within a week. The subsequent uses of the same client token within a week will return the same job id as the previous time, and will not trigger a subsequent SNS notification.

This can easily be eliminated as the cause by temporarily using a uuid as the client token

Upvotes: 0

Mohammed Sameer
Mohammed Sameer

Reputation: 51

Faced the Same issue, Changing from fifo SNS to a Standard SNS worked for me. Not sure if the mandatory .fifo naming convention is causing this behavior. Will update once i get the proper support from aws.

From your configuration i can see you have not used the proper naming convention for creting an SNS which gets notified by Textract. For textract the SNS should start with AmazonTextract* . Make sure you prepend your sns with AmazonTextract always.

Upvotes: 5

smac2020
smac2020

Reputation: 10734

Using a random IAM role for a specific task is not best practice. For this use case, you should use an IAM role that has a SNS policy attached to it. I would try using something like this:

enter image description here

Upvotes: 2

Related Questions