Reputation: 656
When I execute :
list.sortByKey.take(10).foreach(println)
the result is not correct. However when I modify it to :
list.sortByKey(false,1).take(10).foreach(println)
I have a correct result
Upvotes: 0
Views: 710
Reputation: 11449
1)
xxx.sortByKey().foreach(println)
Foreach runs in parallel across the partitions beacuse of that you will not get ordering. The order may be mixed.
2)
Following code is work for only 1 partitions and start breaking on cluster or more than 1 workers
xxx.sortByKey(numPartitions=1).foreach(println)
3)
xxx.sortByKey().collect
Collect gives array of the partitions concatenated in their sorted order.
Upvotes: 1
Reputation: 6974
You can do that by named parameters explicit assignment
like
list.rdd.sortByKey(numPartitions = 1).take(10).foreach(println)
This should work
Upvotes: 0