Reputation: 75
I have an example data as below.
+---------+------------+--------+
| user id | sequence | Action |
|---------|------------|--------|
| 12345 | 1 | Run |
| 12345 | 2 | Sit |
| 12345 | 3 | Sit |
| 12345 | 4 | Run |
| 12345 | 5 | Run |
| 12345 | 6 | Sit |
+---------+------------+--------+
Now I'd like the result should be like this:
+---------+---------+
| user id | Action |
|---------|---------|
| 12345 | Run |
| 12345 | Sit |
| 12345 | Run |
| 12345 | Sit |
+---------+---------+
The row with sequence #2 and #3 should be merged, #4 and #5 should be merged. I use 'group by Action' will get answer like the following table, but it isn't what I want:
+---------+---------+
| user id | Action |
|---------|---------|
| 12345 | Run |
| 12345 | Sit |
+---------+---------+
How can I implement SQL(I use Google Bigquery)?
Thanks a million!
Upvotes: 1
Views: 52
Reputation: 172944
Below for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(dup) FROM (
SELECT *, action = LAG(action, 1, '') OVER(PARTITION BY user_id ORDER BY sequence) AS dup
FROM `project.dataset.table`
)
WHERE NOT dup
If to apply to sample data from your question - output is
Row user_id sequence action
1 12345 1 Run
2 12345 2 Sit
3 12345 4 Run
4 12345 6 Sit
Upvotes: 1
Reputation: 222402
You can use window functions: the idea is to compare the action on each row to the "previous" action, and filter on the rows where the value changes:
select *
from (
select t.*, lag(action) over(partition by user_id order by sequence) lag_action
from mytable t
) t
where action <> lag_action or lag_action is null
Upvotes: 0