Reputation: 1
I have an RDD
, want to group data based on multiple column. for large dataset spark cannot work using combineByKey
, groupByKey
, reduceByKey
and aggregateByKey
, these gives heap space error. Can you give another method for resolving it using Scala's API?
Upvotes: 0
Views: 732
Reputation: 8036
You may want to use treeReduce()
for doing incremental reduce in Spark. However, you hypothesis that spark can not work on large dataset is not true, and I suspect you just don't have enough partitions in your data, so maybe a repartition()
is what you need.
Upvotes: 1