CRM
CRM

Reputation: 1349

Spark partitions taking uneven time to execute with frequent executor lost failure

My spark application processes input CSV file in several stages. At each stage, the RDD's created are large in size(in MB's to GB's). I have created different number of partitions and tried but always the partitions complete the stage unevenly. Some partitions finish a stage very soon but last few partitions always take lot of time and keep throwing heartbeat timeout and executor lost failures and keeps retrying.

I have tried changing the number of partitions to different values but always the last few partitions never complete. I could not fix this issue after trying for long too.

How do I handle this?

Upvotes: 0

Views: 770

Answers (1)

zero323
zero323

Reputation: 330183

Most likely the source of the problem is this part:

aggregateByKey(Vector.empty[(Long, Int)])(_ :+ _, _ ++ _)

Basically what you're doing is a significantly less efficient version of groupByKey. If distribution of keys is skewed then distribution after the aggregation will be skewed as and result in uneven load on different machines.

Moreover if data for a single key won't fit into main memory a whole process will fail for the same reason as with groupByKey.

Upvotes: 3

Related Questions