Kanav Sharma
Kanav Sharma

Reputation: 307

Spark mapPartitionsWithIndex : Identify a partition

Identify a partition :

mapPartitionsWithIndex(index, iter)

The method results into driving a function onto each partition. I understand that we can track the partition using "index" parameter.

Numerous examples have used this method to remove the header in a data set using "index = 0" condition. But how do we make sure that the first partition which is read (translating, "index" parameter to be equal to 0) is indeed the header. Isint it random or based upon the partitioner, if used.

Upvotes: 3

Views: 5977

Answers (1)

Balaji Reddy
Balaji Reddy

Reputation: 5700

Isn't it random or based upon the partitioner, if used?

It is not random but partitioner number. You can understand it with below mentioned simple example

val base = sc.parallelize(1 to 100, 4)    
base.mapPartitionsWithIndex((index, iterator) => {

  iterator.map { x => (index, x) }

}).foreach { x => println(x) }

Result : (0,1) (1,26) (2,51) (1,27) (0,2) (0,3) (0,4) (1,28) (2,52) (1,29) (0,5) (1,30) (1,31) (2,53) (1,32) (0,6) ... ...

Upvotes: 7

Related Questions