Reputation: 75
I have an example data as below.
+---------+------------+--------+-----------+
| User Id | Sequence | Action | Object |
|---------|------------|--------|-----------|
| 12345 | 1 | Eat | Bread |
| 12345 | 2 | Eat | Steak |
| 12345 | 3 | Eat | Bread |
| 12345 | 4 | Drink | Milk tea |
| 12345 | 5 | Drink | Black tea |
| 12345 | 6 | Eat | Cake |
| 12345 | 7 | Eat | Candy |
| 12345 | 8 | Drink | Black tea |
| 12345 | 9 | Drink | Green tea |
| 12345 | 10 | Drink | Water |
+---------+------------+--------+-----------+
Now I'd like adding one column called 'Group Id' in the table, the result should be like this:
+---------+------------+--------+-----------+-----------+
| User Id | Sequence | Action | Object | Group Id. |
|---------|------------|--------|-----------|-----------|
| 12345 | 1 | Eat | Bread | 1 |
| 12345 | 2 | Eat | Steak | 1 |
| 12345 | 3 | Eat | Bread | 1 |
| 12345 | 4 | Drink | Milk tea | 2 |
| 12345 | 5 | Drink | Black tea | 2 |
| 12345 | 6 | Eat | Cake | 3 |
| 12345 | 7 | Eat | Candy | 3 |
| 12345 | 8 | Drink | Black tea | 4 |
| 12345 | 9 | Drink | Green tea | 4 |
| 12345 | 10 | Drink | Water | 4 |
+---------+------------+--------+-----------+-----------|
The same action should be divided into a group, but will be separated by a different order. How can I implement SQL(I use Google Bigquery)?
Thanks a million!
Upvotes: 0
Views: 53
Reputation: 173036
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(new_group),
COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
FROM (
SELECT *,
Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
FROM `project.dataset.table`
)
-- ORDER BY User_Id
If to apply to sample data from your question - the output is
Row User_Id Sequence Action Object Group_Id
1 12345 1 Eat Bread 1
2 12345 2 Eat Steak 1
3 12345 3 Eat Bread 1
4 12345 4 Drink Milk tea 2
5 12345 5 Drink Black tea 2
6 12345 6 Eat Cake 3
7 12345 7 Eat Candy 3
8 12345 8 Drink Black tea 4
9 12345 9 Drink Green tea 4
10 12345 10 Drink Water 4
Upvotes: 2
Reputation: 1270191
This is a type of gaps-and-islands problem. A simple method uses lag()
to determine where a change occurs than a cumulative sum:
select t.*,
1 + sum( case when prev_action = action then 0 else 1 end ) over (order by sequence) as group_id
from (select t.*,
lag(action) over (order by sequence) as prev_action
from t
) t;
You can also phrase the outer logic using countif()
:
1 + countif( prev_action <> acction ) over (order by sequence) as group_id
Upvotes: 2