Grouping SQL Results based on order

I have table with data something like this:

ID     | RowNumber     | Data
------------------------------
1      | 1             | Data
2      | 2             | Data
3      | 3             | Data
4      | 1             | Data
5      | 2             | Data
6      | 1             | Data
7      | 2             | Data
8      | 3             | Data
9      | 4             | Data

I want to group each set of RowNumbers So that my result is something like this:

ID     | RowNumber     | Group | Data
--------------------------------------
1      | 1             | a     | Data
2      | 2             | a     | Data
3      | 3             | a     | Data
4      | 1             | b     | Data
5      | 2             | b     | Data
6      | 1             | c     | Data
7      | 2             | c     | Data
8      | 3             | c     | Data
9      | 4             | c     | Data

The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.

Additional Info

ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.

Upvotes: 7

Answers (4)

Martin Smith

Reputation: 453233

For the clarified requirements in the comments

The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6" ... a higher number followed by a lower would be a new group.

A SQL Server 2012 solution could be as follows.

Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.

Code

WITH T1 AS
(
SELECT *,
       LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
       IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
        RowNumber,
        Data,
        SUM(NewGroup) OVER (ORDER BY ID 
                            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS  Grp
FROM T2

SQL Fiddle

Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.

Upvotes: 6

muhmud

Reputation: 4604

How about:

select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
     select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
     from [Your Table] t
) t
order by ID

This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

Upvotes: 1

Oleksandr Fedorenko

Reputation: 16904

Also you can use recursive CTE

 ;WITH cte AS
 (       
  SELECT ID, RowNumber, Data, 1 AS [Group]
  FROM dbo.test1
  WHERE ID = 1
  UNION ALL
  SELECT t.ID, t.RowNumber, t.Data, 
         CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
  FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
  )
  SELECT *
  FROM cte

Demo on SQLFiddle

Upvotes: 1

Gordon Linoff

Reputation: 1269753

If the ids are truly sequential, you can do:

select t.*,
       (id - rowNumber) as grp
from t

Upvotes: 2

Grouping SQL Results based on order

Answers (4)

Related Questions