Reputation: 899
The grp
column should be represent a block partitioned by GroupId, RouteId
and ordered LengthStart
where as you travel through the blocks if the locationId remains the same the grp
is the same. A break in the sequence creates a new grp
+---------+----------+--------------+------------+-------------+------+
| GROUPID | ROUTEID | LENGTHSTART | LENGTHEND | LOCATIONID | GRP |
+---------+----------+--------------+------------+-------------+------+
| 1 | A | 0 | 1 | 1 | 1 |
| 1 | A | 1 | 2 | 1 | 1 |
| 1 | A | 2 | 3 | 2 | 2 |
| 1 | A | 3 | 4 | 1 | 3 |
| 2 | A | 2 | 3 | 2 | 4 |
| 1 | B | 2 | 3 | 2 | 5 |
| 1 | A | 4 | 5 | 1 | 3 |
+---------+----------+--------------+------------+-------------+------+
My search on this problem led me to this solution: DENSE_RANK according to particular order
My attempt at making the grp
calculation:
SELECT *, ROW_NUMBER() OVER (ORDER BY GroupId, RouteId, LengthStart) - ROW_NUMBER() OVER (PARTITION BY GroupId, RouteId, LocationId ORDER BY GroupId, RouteId, LengthStart) AS grp
FROM mytable
I tried to adopt that solution such that I could have more levels of partitioning and it works in really basic use-cases (like the one showcased above but in complicated scenarios its failing).
I don't really understand completely why two row_number() are subtracting and how it works out but it worked really well in the simple example.
I tried other approaches uses LAG
but just fail to understand how to take the logic and apply it.
Here is a fiddle with the more complicated scenerio:
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=3704dfe8583b0dd020b189184d149cb7
You can see one of the many mistakes I've been seeing highlighted here:
Upvotes: 1
Views: 638
Reputation: 35603
By altering the parameters of the over clauses will produce different result. Note you don't need to repeat columns used for partitioning in the order by. The concept of this technique is to calculate a common value (in column grp
) that hat is often used in subsequent calculations such as; number of steps, or cost, or min/max etc.
SELECT
*
, ROW_NUMBER() OVER (ORDER BY GroupId, RouteId, LengthStart)
- ROW_NUMBER() OVER (PARTITION BY GroupId, RouteId ORDER BY LengthStart ) AS grp
FROM mytable
ORDER BY
GroupId
, RouteId
, LengthStart
It is useful to output the two row_number calculation so that you can see how this works:
Id GroupId RouteId LengthStart LengthEnd LocationId rn1 rn2 grp
--------- --------- --------- ------------- ----------- ------------ ----- ----- -----
2651246 3 AAA 0.0000000 0.0920000 1884268 1 1 0
2651247 3 AAA 0.0920000 0.5800000 1855305 2 2 0
2651248 3 AAA 0.5800000 1.3610000 1884268 3 3 0
2651249 3 AAA 1.3610000 1.6170000 1884268 4 4 0
2651250 3 AAA 1.6170000 2.3750000 1884268 5 5 0
2681493 3 BBB 0.0000000 1.5600000 1864963 6 1 5
2681494 3 BBB 1.5600000 2.7100000 1864963 7 2 5
2681495 3 BBB 2.7100000 3.3900000 1864963 8 3 5
2954915 3 CCC 0.0000000 0.0500000 1883382 9 1 8
2954916 3 CCC 0.0500000 0.1400000 1846300 10 2 8
…
The rn1
column starts at 1 and keeps incrementing. The second rn column restarts at 1 with each partition, but as this also increments by one on the same order sequence, when you deduct rn2
from rn1
you get a "constant" result for grp
for each partition used in rn2
Upvotes: 0
Reputation: 86765
This looks to be a gaps-and-islands problem where you have mis-understood the solution.
In this case the grp
column doesn't identify a gap or island on it's own, it just gives you the extra information you need to identify them.
In your case, the way you've implemented it (locationId, grp)
forms the new "sub_group" identifier.
I'd also change your query slightly to make it easy to understand what's happening...
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY GroupID, RouteID ORDER BY LengthStart)
- ROW_NUMBER() OVER (PARTITION BY GroupId, RouteId, LocationId ORDER BY LengthStart) AS grp
FROM
mytable
ORDER BY
GroupId, RouteId, LengthStart
That makes it more explicit that different GroupID, RouteID
have nothing to do with each other when working out the gaps and islands (sub-groups).
Then I can show a simplified example...
LocationID, Start, SetRowNum, LocRowNum, Difference, GroupID
1 000 1 1 0 (1,0)
1 100 2 2 0 (1,0)
2 200 3 1 2 (2,2)
2 300 4 2 2 (2,2)
1 400 5 3 2 (1,2)
2 500 6 3 3 (2,3)
3 600 7 1 6 (3,6)
2 700 8 4 4 (2,4)
2 800 9 5 4 (2,4)
An island
is worked out for each LocationID
separately.
A gap
is just any row for any other LocationID
.
The "trick" is that within each island both the rownumbers are increasing together. By both increasing together, the difference remains the same. That difference is the island_id
.
Then, during a gap, the first rownumber is increasing, such that when we reach the next island the difference between the two rownums have increased, giving a new island_id
**for that LocationID
.
Remember, when working out the island_id
for location_1
we treat all other locations as gaps
between location 1's islands.
For Location 1, we have islands at rows 1,2
and 5
, with IslandID
of 0
and 2
respectively.
LocationID, Start, SetRowNum, LocRowNum, Difference, GROUP_ID
1 000 1 1 0 (1,0)
1 100 2 2 0 (1,0)
GAP
1 400 5 3 2 (1,2)
For Location 2, we have islands at rows 3,4
, 6
and 8,9
with IslandID
of 2
, 3
, 4
respectively.
LocationID, Start, SetRowNum, LocRowNum, Difference
GAP
2 200 3 1 2 (2,2)
2 300 4 2 2 (2,2)
GAP
2 500 6 3 3 (2,3)
GAP
2 700 8 4 4 (2,4)
2 800 9 5 4 (2,4)
For Location 3, we have islands at row 7
with IslandID
of 6
.
LocationID, Start, SetRowNum, LocRowNum, Difference
GAP
3 600 7 1 6 (3,6)
GAP
All in all, the islands all have different ID's.
But Location1 and Location2 both have islands with IslandID = 2.
(locationID, IslandID)
together Upvotes: 1