shubhamkakran
shubhamkakran

Reputation: 31

how can i avoid OOMs error in AWS Glue Job in pyspark

I am getting this error while running AWS Glue job using 40 workers and processing 40GB data

Caused by: org.apache.spark.memory.SparkOutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@5fa14240 : No space left on device

How can i optimize my job to avoid such error on pyspark

Here is the pic of metrics glue_metrics

Upvotes: 3

Views: 1713

Answers (1)

semaphore
semaphore

Reputation: 99

AWS Glue Spark shuffle manager with Amazon S3

Requires using Glue 2.0

See the following links.

Upvotes: 1

Related Questions