Andrei Burlacu
Andrei Burlacu

Reputation: 3

AWS Glue S3Exception on MERGE INTO query

I'm pretty new at working with Glue job and I encountered this problem. I have 2 Glue ETL jobs. First one process full export from DynamoDB table, transforms and partition the data and write it in Iceberg table. The second one takes latest cdc from s3 path and performs MERGE INTO query to upsert the data. First job is working fine, but the second Glue job fails with S3Exception: Please reduce your request rate.. The original tables in DynamoDB console have size around 1TB. The tables are partitioned by 1 table column into 1024 bucket prefixes. Glue configurations are: 150 G 2X workers.

I tried to play with partitioning (lowering or increasing) but nothing seems to work.

Upvotes: 0

Views: 585

Answers (1)

parisni
parisni

Reputation: 1152

S3Exception: Please reduce your request rate

Likely this it also refersas s3 slow down issue. Whike you cannot fix this on s3 side, you can configure the spark access to s3, which is based on hadoop library.

You can either :

  1. Increase the retry spark.hadoop.fs.s3.maxRetries=50
  2. Use aimd method spark.hadoop.fs.s3.aimd.enabled=true

More details here on aws doc, while not specific to EMR, but works in general with spark and s3

Note that depending on the protocol you are using, you might replace s3 either with s3a or s3n in the spark conf.

Upvotes: 0

Related Questions