Divide the RDD into partitions with fixed number of elements in each partition

Question

In Apache Spark,

repartition(n) - allows partitioning the RDD into exactly n partitions.

But how to partition the given RDD into partitions such that all partitions (exception for the last partition) have specified number of elements. Given that number of elements in RDD is not known and doing .count() is expensive.

C = sc.parallelize([x for x in range(10)],2)
Let's say internally,  C = [[0,1,2,3,4,5], [6,7,8,9]]  
C = someCode(3)

Expected:

C = [[0,1,2], [3,4,5], [6, 7, 8], [9]]

Divide the RDD into partitions with fixed number of elements in each partition

Answers (1)

Related Questions