Reputation: 237
I am now wishing to test the behavior of repartition()
and coalesce()
on my own, especially in a not so common situation where numsPartion
keeps unchanged, I wish to see will a call of repartition with same partition number will still do a full shuffle on all data. Then I realized that I lack the measure to check the exact content of each partition. I am just using a paralyzed-list as my sample RDD. Is there any way I can inspect the contents of each partition so that I can verify my doubts?
Oh maybe there exists other more recent API that can suit this aim?
Thanks in advance.
Upvotes: 3
Views: 1171
Reputation: 45309
You can use RDD.glom()
, which
Returns an RDD created by coalescing all elements within each partition into an array.
For an example, the following 8-partition RDD can be inspected using:
val rdd = sc.parallelize(Seq(1,2,3,4,5,6,7,8,9,10))
rdd.glom().collect()
//Result
res3: Array[Array[Int]] = Array(Array(1), Array(2), Array(3), Array(4, 5),
Array(6), Array(7), Array(8), Array(9, 10))
Upvotes: 6