Slim AZAIZ
Slim AZAIZ

Reputation: 656

why the number of partitions in sortByKey() is not equal by default to one?

When I execute :

list.sortByKey.take(10).foreach(println)

the result is not correct. However when I modify it to :

list.sortByKey(false,1).take(10).foreach(println)

I have a correct result

Upvotes: 0

Views: 710

Answers (2)

vaquar khan
vaquar khan

Reputation: 11449

1)

  xxx.sortByKey().foreach(println)

Foreach runs in parallel across the partitions beacuse of that you will not get ordering. The order may be mixed.

2)

Following code is work for only 1 partitions and start breaking on cluster or more than 1 workers

 xxx.sortByKey(numPartitions=1).foreach(println)

3)

  xxx.sortByKey().collect

Collect gives array of the partitions concatenated in their sorted order.

Upvotes: 1

Avishek Bhattacharya
Avishek Bhattacharya

Reputation: 6974

You can do that by named parameters explicit assignment

like

 list.rdd.sortByKey(numPartitions = 1).take(10).foreach(println)

This should work

Upvotes: 0

Related Questions