user6107173
user6107173

Reputation: 609

Sorting in Spark Physical Plan

I'm interested in the top two sortings in this screen shot. The column uniqid#2509 are in both sorting. Does this sorting

+- *Sort [uniqid#2509 ASC NULLS FIRST, __$start_lsn#2483 DESC NULLS LAST, __$seqval#2484 DESC NULLS LAST], false, 0

benefit from the second sorting

:- *Sort [uniqid#2509 ASC NULLS FIRST], false, 0

or the column uniqid#2509 is sorted twice?

Thanks

enter image description here

Upvotes: 0

Views: 257

Answers (1)

DaRkMaN
DaRkMaN

Reputation: 1054

The first sort(from the bottom) is a reducer side sort of the mapper data.
In a Sort merge join, once the data is partitioned based on the hash key, it is sorted on the reducer side, to perform a sort-merge join.

Will second sort benefit from the results of sorts in sort-merge join

In this case, it will benefit, not because of the sort but because of the sort-merge join.

How

  1. If the sort ordering in the second sort node was same as that of Sort merge join(default ascending), there would not have been a sort node. In this case i think it is descending(Not clear from the screen capture)
  2. In this case we are sorting on the same key as used in join node, hence the data is already partitioned and only sorting needs to perform in the given sort order. The shuffle exchange(which is an expensive operation) is avoided since Sort merge join has taken care of it.

Upvotes: 1

Related Questions