DENSE_RANK breaking sequentially when changed

Question

The grp column should be represent a block partitioned by GroupId, RouteId and ordered LengthStart where as you travel through the blocks if the locationId remains the same the grp is the same. A break in the sequence creates a new grp

+---------+----------+--------------+------------+-------------+------+
| GROUPID |  ROUTEID |  LENGTHSTART |  LENGTHEND |  LOCATIONID |  GRP |
+---------+----------+--------------+------------+-------------+------+
|     1   | A        | 0            | 1          | 1           | 1    |
|     1   | A        | 1            | 2          | 1           | 1    |
|     1   | A        | 2            | 3          | 2           | 2    |
|     1   | A        | 3            | 4          | 1           | 3    |
|     2   | A        | 2            | 3          | 2           | 4    |
|     1   | B        | 2            | 3          | 2           | 5    |
|     1   | A        | 4            | 5          | 1           | 3    |
+---------+----------+--------------+------------+-------------+------+

My search on this problem led me to this solution: DENSE_RANK according to particular order

My attempt at making the grp calculation:

SELECT *, ROW_NUMBER() OVER (ORDER BY GroupId, RouteId, LengthStart) - ROW_NUMBER() OVER (PARTITION BY GroupId, RouteId, LocationId ORDER BY GroupId, RouteId, LengthStart) AS grp
FROM mytable

I tried to adopt that solution such that I could have more levels of partitioning and it works in really basic use-cases (like the one showcased above but in complicated scenarios its failing).

I don't really understand completely why two row_number() are subtracting and how it works out but it worked really well in the simple example.

I tried other approaches uses LAG but just fail to understand how to take the logic and apply it.

Here is a fiddle with the more complicated scenerio:

https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=3704dfe8583b0dd020b189184d149cb7

You can see one of the many mistakes I've been seeing highlighted here:

MatBailie · Accepted Answer

This looks to be a gaps-and-islands problem where you have mis-understood the solution.

In this case the grp column doesn't identify a gap or island on it's own, it just gives you the extra information you need to identify them.

In your case, the way you've implemented it (locationId, grp) forms the new "sub_group" identifier.

I'd also change your query slightly to make it easy to understand what's happening...

SELECT
  *,
    ROW_NUMBER() OVER (PARTITION BY GroupID, RouteID             ORDER BY LengthStart)
  - ROW_NUMBER() OVER (PARTITION BY GroupId, RouteId, LocationId ORDER BY LengthStart) AS grp
FROM
  mytable
ORDER BY
  GroupId, RouteId, LengthStart

That makes it more explicit that different GroupID, RouteID have nothing to do with each other when working out the gaps and islands (sub-groups).

Then I can show a simplified example...

  LocationID, Start, SetRowNum, LocRowNum, Difference, GroupID

      1        000       1       1          0          (1,0)
      1        100       2       2          0          (1,0)

      2        200       3         1          2        (2,2)
      2        300       4         2          2        (2,2)

      1        400       5       3          2          (1,2)

      2        500       6         3          3        (2,3)

      3        600       7           1          6      (3,6)

      2        700       8         4          4        (2,4)
      2        800       9         5          4        (2,4)

An island is worked out for each LocationID separately.

A gap is just any row for any other LocationID.

The "trick" is that within each island both the rownumbers are increasing together. By both increasing together, the difference remains the same. That difference is the island_id.

Then, during a gap, the first rownumber is increasing, such that when we reach the next island the difference between the two rownums have increased, giving a new island_id **for that LocationID.

Remember, when working out the island_id for location_1 we treat all other locations as gaps between location 1's islands.

For Location 1, we have islands at rows 1,2 and 5, with IslandID of 0 and 2 respectively.

  LocationID, Start, SetRowNum, LocRowNum, Difference, GROUP_ID

      1        000       1       1          0          (1,0)
      1        100       2       2          0          (1,0)

      GAP

      1        400       5       3          2          (1,2)

For Location 2, we have islands at rows 3,4, 6 and 8,9 with IslandID of 2, 3, 4 respectively.

  LocationID, Start, SetRowNum, LocRowNum, Difference

     GAP

      2        200       3         1          2        (2,2)
      2        300       4         2          2        (2,2)

     GAP

      2        500       6         3          3        (2,3)

     GAP

      2        700       8         4          4        (2,4)
      2        800       9         5          4        (2,4)

For Location 3, we have islands at row 7 with IslandID of 6.

  LocationID, Start, SetRowNum, LocRowNum, Difference

     GAP

      3        600       7           1          6      (3,6)

     GAP

All in all, the islands all have different ID's.

But Location1 and Location2 both have islands with IslandID = 2.

So, you need to use (locationID, IslandID) together
The composite key will be unique for that partition

DENSE_RANK breaking sequentially when changed

Answers (2)

Related Questions