Reputation: 4283
When running an AWS Glue crawler that points to S3, the second log entry in CloudWatch is always:
Crawl is not running in S3 event mode
What is S3 event mode?
The name sounds like some way of getting S3 to invoke Glue for partial crawls after every object upload to the prefix. But as far as I can tell, such functionality does not exist. So what is this log entry referring to?
The closest thing I found in the Glue documentation was event based triggers for Glue jobs, but Glue Jobs are different to Glue Crawlers.
2021-07-01T20:04:39.882+10:00
[6588c8ba-57e2-46e3-94b4-1bc4dfc5957d] BENCHMARK : Running Start Crawl for Crawler my-crawler
2021-07-01T20:04:40.200+10:00
[6588c8ba-57e2-46e3-94b4-1bc4dfc5957d] INFO : Crawl is not running in S3 event mode
Upvotes: 2
Views: 2088
Reputation: 4283
AWS Support gave me an answer.
S3 Event mode is functionality available internally inside AWS. As I suspected it means S3 triggers crawler crawls for every file upload. But this functionality is not public at the moment.
Upvotes: 2
Reputation: 1
I had the same problem and I found a solution in this article https://www.linkedin.com/pulse/my-top-5-gotchas-working-aws-glue-tanveer-uddin/
In short though the solution was to have aws-glue- before the name of my bucket. So, for example trying to get a crawler to go through a bucket called test-bucket would not work but if I change the name to aws-glue-test-bucket then works.
Upvotes: 0