Anand
Anand

Reputation: 3

Scala repeated sorting in the result

I have a RDD containing "Customer", "Amt spent". I am trying to perform a simple sorting to order by "Amt Spent". When i view the results, i see multiple sorting happening.

As you can see from the result, first 5 entries are sorted by Amt spent, and then again another sorting happens. What could be the issue here?

Upvotes: 0

Views: 62

Answers (1)

The problem is not that the sorted did nit worked, but that you called a println() inside a foreach on a RDD - that operation is performed on parallel on all partitions.
And, on a real cluster (not in a local environment for development) you would not even see the printed lines, because those would happen on the executors' JVMs.

The RDD is sorted, but if you don't want to believe me (that would be smart), then you may perform a collect before the foreach, that would fetch all results to the driver first and then print them (it will fetch them in the order they are).

Upvotes: 1

Related Questions