partition by values from another related row

Question

Im trying to identify the latest merge transaction in a table for each account. This table is populated with account merges, latest merge transaction being the larger merge_tx_id

Source table:

| merge_tx_id | merge_from_id | merge_to_id |
|-------------|---------------|-------------|
|           1 |           546 |         100 |
|           2 |           789 |         802 |
|           3 |           123 |         456 |
|          33 |           456 |         123 |
|         222 |           802 |         789 |
|         333 |           123 |         456 |

merge_tx_id IN (3,33,333) are the same account based on the merge_from_id or merge_to_id appearing in another row, in this example a merge was performed then 'unmerged' again later. the latest id should be '456'

| merge_tx_id | merge_from_id | merge_to_id |
|-------------|---------------|-------------|
|           3 |           123 |         456 |
|          33 |           456 |         123 |
|         333 |           123 |         456 |

merge_tx_id IN (2,222) are the same account based on the merge_from_id or merge_to_id appearing in another row, the latest is is '789'

| merge_tx_id | merge_from_id | merge_to_id |
|-------------|---------------|-------------|
|           2 |           789 |         802 |
|         222 |           802 |         789 |

merge_tx_id IN (1) is by not related to any of the other rows, latest id is '100'

| merge_tx_id | merge_from_id | merge_to_id |
|-------------|---------------|-------------|
|           1 |           546 |         100 |

I think I need to join the table to itself and perform a RANK but am unclear how, if i can, perform it a partition on (column 1 OR column 2).

The desired result is

| merge_tx_id | merge_from_id | merge_to_id |
|-------------|---------------|-------------|
|           1 |           546 |         100 |
|         333 |           123 |         456 |
|         222 |           802 |         789 |

example: http://sqlfiddle.com/#!3/9a94b6/11

Giorgos Betsos · Accepted Answer

You can use the following query:

SELECT merge_tx_id, merge_from_id, merge_to_id
FROM (
SELECT merge_tx_id, merge_from_id, merge_to_id,
       ROW_NUMBER() OVER (PARTITION BY x.merge1, y.merge2
                          ORDER BY merge_tx_id DESC) AS rn
FROM merge_grp
CROSS APPLY (SELECT CASE WHEN merge_from_id < merge_to_id 
                            THEN merge_from_id
                            ELSE merge_to_id
                    END AS merge1) AS x
CROSS APPLY (SELECT CASE WHEN merge_from_id >= merge_to_id 
                            THEN merge_from_id
                            ELSE merge_to_id
                         END AS merge2) AS y  ) t
WHERE t.rn = 1

The query employs two CROSS APPLY clauses, in order to re-arrange merge_from_id, merge_to_id as two equivalent columns, merge1, merge2, which are in ascending order. ROW_NUMBER uses these two calculated columns to partition rows and hence enable selection of the row having the maximum merge_tx_id per merge1, merge2.

Demo here

partition by values from another related row

Answers (1)

Related Questions