Removing consecutive duplicate rows bigquery

Question

I have some event data that are grouped into sessions. In some cases there are consecutive duplicate events within those sessions which I'd like to remove. Is there a way to do this in bigquery?

E.g.

Initial table:

session   eventType   eventOrder
1         a           1
1         a           2     
1         b           3
2         b           1
2         a           2
2         b           3

Desired result:

session   eventType   eventOrder
1         a           1
1         b           3
2         b           1
2         a           2
2         b           3

GMB · Accepted Answer

You could use lag(). The idea is to compare the current event type to the previous value for the same session, and eliminate records where they match:

select session, eventType, eventOrder
from (
    select 
        t.*,
        lag(eventType) over(partition by session order by eventOrder) lag_eventType
    from mytable t
) t
where lag_eventType is null or eventType != lag_eventType

Removing consecutive duplicate rows bigquery

Answers (1)

Related Questions