falsePockets
falsePockets

Reputation: 4283

Crawl is not running in S3 event mode

When running an AWS Glue crawler that points to S3, the second log entry in CloudWatch is always:

Crawl is not running in S3 event mode

What is S3 event mode?

The name sounds like some way of getting S3 to invoke Glue for partial crawls after every object upload to the prefix. But as far as I can tell, such functionality does not exist. So what is this log entry referring to?

The closest thing I found in the Glue documentation was event based triggers for Glue jobs, but Glue Jobs are different to Glue Crawlers.

Steps to reproduce

  1. Create a Glue Crawler. Choose any configuration. Point it to anywhere in any S3 bucket with any dataset (even an empty one)
  2. Run the crawler. It doesn't matter if the crawl fails or succeeds
  3. Open the logs for that crawl
  4. Look at the second log entry
2021-07-01T20:04:39.882+10:00
[6588c8ba-57e2-46e3-94b4-1bc4dfc5957d] BENCHMARK : Running Start Crawl for Crawler my-crawler
2021-07-01T20:04:40.200+10:00
[6588c8ba-57e2-46e3-94b4-1bc4dfc5957d] INFO : Crawl is not running in S3 event mode

Upvotes: 2

Views: 2088

Answers (2)

falsePockets
falsePockets

Reputation: 4283

AWS Support gave me an answer.

S3 Event mode is functionality available internally inside AWS. As I suspected it means S3 triggers crawler crawls for every file upload. But this functionality is not public at the moment.

Upvotes: 2

ScottishProgramer
ScottishProgramer

Reputation: 1

I had the same problem and I found a solution in this article https://www.linkedin.com/pulse/my-top-5-gotchas-working-aws-glue-tanveer-uddin/

In short though the solution was to have aws-glue- before the name of my bucket. So, for example trying to get a crawler to go through a bucket called test-bucket would not work but if I change the name to aws-glue-test-bucket then works.

Upvotes: 0

Related Questions