Reputation: 117
I have a requirement to group every N rows within each group of a SQL table, the best answer I've found is this; https://stackoverflow.com/a/66806186/10916933 but my data is also grouped by another identifier (Batch) and I would like to start a new GroupID every time the identifier changes. Batches are not of a consistent size.
This is what I get currently (N = 2 for display purposes, it is actually 1000):
Batch | Value | RowID | GroupID |
---|---|---|---|
A | 10.2 | 1 | 1 |
A | 6.7 | 2 | 1 |
A | 7.6 | 3 | 2 |
B | 7.0 | 4 | 2 |
B | 10.2 | 5 | 3 |
C | 9.6 | 6 | 3 |
C | 8.2 | 7 | 4 |
C | 1.1 | 8 | 4 |
C | 0.3 | 9 | 5 |
C | 9.0 | 10 | 5 |
C | 10.2 | 11 | 6 |
This is what I want (i.e. when Batch B starts, it gives me a new GroupID even though there has only been one):
Batch | Value | RowID | GroupID |
---|---|---|---|
A | 10.2 | 1 | 1 |
A | 6.7 | 2 | 1 |
A | 7.6 | 3 | 2 |
B | 7.0 | 4 | 3 |
B | 10.2 | 5 | 3 |
C | 9.6 | 6 | 4 |
C | 8.2 | 7 | 4 |
C | 1.1 | 8 | 5 |
C | 0.3 | 9 | 5 |
C | 9.0 | 10 | 6 |
C | 10.2 | 11 | 6 |
Upvotes: 1
Views: 1499
Reputation: 4694
Gordon has the answer. Toss him that checkmark. Here's a slightly adjusted version:
CREATE TABLE batches (
Batch VARCHAR(4)
, Value DECIMAL(5, 2)
, RowID int
, GroupID int
);
INSERT INTO batches VALUES
('A' , 10.2 , 1 , 1 )
, ('A' , 6.7 , 2 , 1 )
, ('A' , 7.6 , 3 , 2 )
, ('B' , 7.0 , 4 , 2 )
, ('B' , 10.2 , 5 , 3 )
, ('C' , 9.6 , 6 , 3 )
, ('C' , 8.2 , 7 , 4 )
, ('C' , 1.1 , 8 , 4 )
, ('C' , 0.3 , 9 , 5 )
, ('C' , 9.0 ,10 , 5 )
, ('C' , 10.2 ,11 , 6 )
;
WITH xrows AS (
SELECT t.*
, ROW_NUMBER() OVER (PARTITION BY Batch ORDER BY RowID) % 2 AS is_start
FROM batches AS t
)
SELECT t.*
, SUM(is_start) OVER (ORDER BY Batch, RowID) AS new_GroupID
FROM xrows AS t
ORDER BY RowID
;
Result:
+-------+-------+-------+---------+----------+-------------+
| Batch | Value | RowID | GroupID | is_start | new_GroupID |
+-------+-------+-------+---------+----------+-------------+
| A | 10.20 | 1 | 1 | 1 | 1 |
| A | 6.70 | 2 | 1 | 0 | 1 |
| A | 7.60 | 3 | 2 | 1 | 2 |
| B | 7.00 | 4 | 2 | 1 | 3 |
| B | 10.20 | 5 | 3 | 0 | 3 |
| C | 9.60 | 6 | 3 | 1 | 4 |
| C | 8.20 | 7 | 4 | 0 | 4 |
| C | 1.10 | 8 | 4 | 1 | 5 |
| C | 0.30 | 9 | 5 | 0 | 5 |
| C | 9.00 | 10 | 5 | 1 | 6 |
| C | 10.20 | 11 | 6 | 0 | 6 |
+-------+-------+-------+---------+----------+-------------+
Upvotes: 1
Reputation: 1269973
Hmmm . . . You can indicate where a group starts using lag()
and row number arithmetic. Then do a cumulative sum. A new group starts on the odd numbered rows of each batch:
select t.*,
sum(is_start) over (order by batch, rowid) as groupid
from (select t.*,
(case when row_number() over (partition by batch order by rowid) % 2 = 1
then 1 else 0
end) as is_start
from t
) t
Upvotes: 2