JohnsonLee
JohnsonLee

Reputation: 75

SQL to get some groups and keep the order

I have an example data as below.

+---------+------------+--------+
| user id |   sequence | Action |
|---------|------------|--------|
| 12345   |    1       | Run    |
| 12345   |    2       | Sit    |
| 12345   |    3       | Sit    |
| 12345   |    4       | Run    |
| 12345   |    5       | Run    |
| 12345   |    6       | Sit    |
+---------+------------+--------+

Now I'd like the result should be like this:

+---------+---------+
| user id |  Action |
|---------|---------|
| 12345   |  Run    |
| 12345   |  Sit    |
| 12345   |  Run    |
| 12345   |  Sit    |
+---------+---------+

The row with sequence #2 and #3 should be merged, #4 and #5 should be merged. I use 'group by Action' will get answer like the following table, but it isn't what I want:

+---------+---------+
| user id |  Action |
|---------|---------|
| 12345   |  Run    |
| 12345   |  Sit    |
+---------+---------+

How can I implement SQL(I use Google Bigquery)?

Thanks a million!

Upvotes: 1

Views: 52

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below for BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(dup) FROM (
  SELECT *, action = LAG(action, 1, '') OVER(PARTITION BY user_id ORDER BY sequence) AS dup
  FROM `project.dataset.table`
)
WHERE NOT dup

If to apply to sample data from your question - output is

Row user_id sequence    action   
1   12345   1           Run  
2   12345   2           Sit  
3   12345   4           Run  
4   12345   6           Sit  

Upvotes: 1

GMB
GMB

Reputation: 222402

You can use window functions: the idea is to compare the action on each row to the "previous" action, and filter on the rows where the value changes:

select *
from (
    select t.*, lag(action) over(partition by user_id order by sequence) lag_action
    from mytable t
) t
where action <> lag_action or lag_action is null

Upvotes: 0

Related Questions