Reputation: 31
I am getting this error while running AWS Glue job using 40 workers and processing 40GB data
Caused by: org.apache.spark.memory.SparkOutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@5fa14240 : No space left on device
How can i optimize my job to avoid such error on pyspark
Here is the pic of metrics glue_metrics
Upvotes: 3
Views: 1713
Reputation: 99
AWS Glue Spark shuffle manager with Amazon S3
Requires using Glue 2.0
See the following links.
https://awscloudfeed.com/whats-new/big-data/introducing-amazon-s3-shuffle-in-aws-glue
https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-shuffle-manager.html
Upvotes: 1