Reputation: 3
I'm pretty new at working with Glue job and I encountered this problem.
I have 2 Glue ETL jobs. First one process full export from DynamoDB table, transforms and partition the data and write it in Iceberg table. The second one takes latest cdc from s3 path and performs MERGE INTO
query to upsert the data.
First job is working fine, but the second Glue job fails with S3Exception: Please reduce your request rate.
.
The original tables in DynamoDB console have size around 1TB.
The tables are partitioned by 1 table column into 1024 bucket prefixes.
Glue configurations are: 150 G 2X workers.
I tried to play with partitioning (lowering or increasing) but nothing seems to work.
Upvotes: 0
Views: 585
Reputation: 1152
S3Exception: Please reduce your request rate
Likely this it also refersas s3 slow down issue. Whike you cannot fix this on s3 side, you can configure the spark access to s3, which is based on hadoop library.
You can either :
spark.hadoop.fs.s3.maxRetries=50
spark.hadoop.fs.s3.aimd.enabled=true
More details here on aws doc, while not specific to EMR, but works in general with spark and s3
Note that depending on the protocol you are using, you might replace s3
either with s3a
or s3n
in the spark conf.
Upvotes: 0