PharmDataSci
PharmDataSci

Reputation: 117

Group by every N rows within a group

I have a requirement to group every N rows within each group of a SQL table, the best answer I've found is this; https://stackoverflow.com/a/66806186/10916933 but my data is also grouped by another identifier (Batch) and I would like to start a new GroupID every time the identifier changes. Batches are not of a consistent size.

This is what I get currently (N = 2 for display purposes, it is actually 1000):

Batch Value RowID GroupID
A 10.2 1 1
A 6.7 2 1
A 7.6 3 2
B 7.0 4 2
B 10.2 5 3
C 9.6 6 3
C 8.2 7 4
C 1.1 8 4
C 0.3 9 5
C 9.0 10 5
C 10.2 11 6

This is what I want (i.e. when Batch B starts, it gives me a new GroupID even though there has only been one):

Batch Value RowID GroupID
A 10.2 1 1
A 6.7 2 1
A 7.6 3 2
B 7.0 4 3
B 10.2 5 3
C 9.6 6 4
C 8.2 7 4
C 1.1 8 5
C 0.3 9 5
C 9.0 10 6
C 10.2 11 6

Upvotes: 1

Views: 1499

Answers (2)

Jon Armstrong
Jon Armstrong

Reputation: 4694

Gordon has the answer. Toss him that checkmark. Here's a slightly adjusted version:

CREATE TABLE batches (
    Batch   VARCHAR(4)
  , Value   DECIMAL(5, 2)
  , RowID   int
  , GroupID int
);

INSERT INTO batches VALUES
  ('A' , 10.2 , 1 , 1 )
, ('A' ,  6.7 , 2 , 1 )
, ('A' ,  7.6 , 3 , 2 )
, ('B' ,  7.0 , 4 , 2 )
, ('B' , 10.2 , 5 , 3 )
, ('C' ,  9.6 , 6 , 3 )
, ('C' ,  8.2 , 7 , 4 )
, ('C' ,  1.1 , 8 , 4 )
, ('C' ,  0.3 , 9 , 5 )
, ('C' ,  9.0 ,10 , 5 )
, ('C' , 10.2 ,11 , 6 )
;

WITH xrows AS (
       SELECT t.*
            , ROW_NUMBER() OVER (PARTITION BY Batch ORDER BY RowID) % 2 AS is_start
         FROM batches AS t
     )
SELECT t.*
     , SUM(is_start) OVER (ORDER BY Batch, RowID) AS new_GroupID
  FROM xrows AS t
 ORDER BY RowID
;

Result:

+-------+-------+-------+---------+----------+-------------+
| Batch | Value | RowID | GroupID | is_start | new_GroupID |
+-------+-------+-------+---------+----------+-------------+
| A     | 10.20 |     1 |       1 |        1 |           1 |
| A     |  6.70 |     2 |       1 |        0 |           1 |
| A     |  7.60 |     3 |       2 |        1 |           2 |
| B     |  7.00 |     4 |       2 |        1 |           3 |
| B     | 10.20 |     5 |       3 |        0 |           3 |
| C     |  9.60 |     6 |       3 |        1 |           4 |
| C     |  8.20 |     7 |       4 |        0 |           4 |
| C     |  1.10 |     8 |       4 |        1 |           5 |
| C     |  0.30 |     9 |       5 |        0 |           5 |
| C     |  9.00 |    10 |       5 |        1 |           6 |
| C     | 10.20 |    11 |       6 |        0 |           6 |
+-------+-------+-------+---------+----------+-------------+

Full test case

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269973

Hmmm . . . You can indicate where a group starts using lag() and row number arithmetic. Then do a cumulative sum. A new group starts on the odd numbered rows of each batch:

select t.*,
       sum(is_start) over (order by batch, rowid) as groupid
from (select t.*,
             (case when row_number() over (partition by batch order by rowid) % 2 = 1
                   then 1 else 0
              end) as is_start
      from t
     ) t

Upvotes: 2

Related Questions