How can I further reduce my Apache Spark task size

Question

I'm trying to run the following code in scala on the Spark framework, but I get an extremely large task size (8MB)

tidRDD:RDD[ItemSet]
mh:MineHelper
x:ItemSet
broadcast_tid:Broadcast[Array[ItemSet]]
count:Int

tidRDD.flatMap(x => mh.mineFreqSets(x, broadcast_tid.value, count)).collect()

The reason I added the MinerHelper class was to make it serialisable, and it only contains given method. An ItemSet is a class with 3 private members and a few getter/setter methods, nothing out of the ordinary. I feel that this is the correct way to approach this problem, but Spark thinks otherwise. Am I making some gaping errors, or is it something small that's wrong?

Here's the warning:

WARN TaskSetManager: Stage 1 contains a task of very large size (8301 KB). The maximum recommended task size is 100 KB.

How can I further reduce my Apache Spark task size

Answers (1)

Related Questions