JohnsonLee
JohnsonLee

Reputation: 75

SQL to get group number

I have an example data as below.

    +---------+------------+--------+-----------+
    | User Id |   Sequence | Action | Object    |
    |---------|------------|--------|-----------|
    | 12345   |    1       | Eat    | Bread     |
    | 12345   |    2       | Eat    | Steak     |
    | 12345   |    3       | Eat    | Bread     |
    | 12345   |    4       | Drink  | Milk tea  |
    | 12345   |    5       | Drink  | Black tea |  
    | 12345   |    6       | Eat    | Cake      |
    | 12345   |    7       | Eat    | Candy     |
    | 12345   |    8       | Drink  | Black tea | 
    | 12345   |    9       | Drink  | Green tea | 
    | 12345   |    10      | Drink  | Water     |
    +---------+------------+--------+-----------+

Now I'd like adding one column called 'Group Id' in the table, the result should be like this:

    +---------+------------+--------+-----------+-----------+
    | User Id |   Sequence | Action | Object    | Group Id. |
    |---------|------------|--------|-----------|-----------|
    | 12345   |    1       | Eat    | Bread     |     1     |
    | 12345   |    2       | Eat    | Steak     |     1     |
    | 12345   |    3       | Eat    | Bread     |     1     |
    | 12345   |    4       | Drink  | Milk tea  |     2     |
    | 12345   |    5       | Drink  | Black tea |     2     |
    | 12345   |    6       | Eat    | Cake      |     3     |
    | 12345   |    7       | Eat    | Candy     |     3     |
    | 12345   |    8       | Drink  | Black tea |     4     |
    | 12345   |    9       | Drink  | Green tea |     4     |
    | 12345   |    10      | Drink  | Water     |     4     |
    +---------+------------+--------+-----------+-----------|

The same action should be divided into a group, but will be separated by a different order. How can I implement SQL(I use Google Bigquery)?

Thanks a million!

Upvotes: 0

Views: 53

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173036

Below is for BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(new_group),
  COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
FROM (
  SELECT *,
    Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
  FROM `project.dataset.table`
)
-- ORDER BY User_Id     

If to apply to sample data from your question - the output is

Row User_Id Sequence    Action  Object      Group_Id     
1   12345   1           Eat     Bread       1    
2   12345   2           Eat     Steak       1    
3   12345   3           Eat     Bread       1    
4   12345   4           Drink   Milk tea    2    
5   12345   5           Drink   Black tea   2    
6   12345   6           Eat     Cake        3    
7   12345   7           Eat     Candy       3    
8   12345   8           Drink   Black tea   4    
9   12345   9           Drink   Green tea   4    
10  12345   10          Drink   Water       4    

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270191

This is a type of gaps-and-islands problem. A simple method uses lag() to determine where a change occurs than a cumulative sum:

select t.*,
       1 + sum( case when prev_action = action then 0 else 1 end ) over (order by sequence) as group_id
from (select t.*,
             lag(action) over (order by sequence) as prev_action
      from t
     ) t;

You can also phrase the outer logic using countif():

1 + countif( prev_action <> acction ) over (order by sequence) as group_id

Upvotes: 2

Related Questions