Thomas Chamberlain
Thomas Chamberlain

Reputation: 127

Running Count by Group and Flag in BigQuery?

I have a table that looks like the below:

Row | Fullvisitorid | Visitid | New_Session_Flag 
1   | A             | 111     | 1
2   | A             | 120     | 0
3   | A             | 128     | 0
4   | A             | 133     | 0
5   | A             | 745     | 1
6   | A             | 777     | 0
7   | B             | 388     | 1
8   | B             | 401     | 0
9   | B             | 420     | 0
10  | B             | 777     | 1
11  | B             | 784     | 0
12  | B             | 791     | 0
13  | B             | 900     | 1  
14  | B             | 904     | 0  

What I want to do is if it's the first row for a fullvisitorid then mark the field as 1, otherwise use the above row as the value, but if the new_session_flag = 1 then use the above row plus 1, example of output I'm looking for below:

Row | Fullvisitorid | Visitid | New_Session_Flag | Rank_Session_Order
1   | A             | 111     | 1                | 1
2   | A             | 120     | 0                | 1
3   | A             | 128     | 0                | 1
4   | A             | 133     | 0                | 1
5   | A             | 745     | 1                | 2
6   | A             | 777     | 0                | 2
7   | B             | 388     | 1                | 1
8   | B             | 401     | 0                | 1
9   | B             | 420     | 0                | 1
10  | B             | 777     | 1                | 2
11  | B             | 784     | 0                | 2
12  | B             | 791     | 0                | 2
13  | B             | 900     | 1                | 3
14  | B             | 904     | 0                | 3

As you can see:

I believe this can be done through a retain statement in SAS but is there an equivalent in Google BigQquery?

Hopefully the above makes sense, let me know if not.

Thanks in advance

Upvotes: 2

Views: 609

Answers (2)

GMB
GMB

Reputation: 222482

The answer by Mikhail Berlyant using a conditional window count is corret and works. I am answering because I find that a window sum is even simpler (and possibly more efficient on a large dataset):

select
    t.*,
    sum(new_session_flag) over(partition by fullvisitorid order by visid_id) rank_session_order
from mytable t

This works because the new_session_flag contains 0s and 1s only; so counting the 1s is actually equivalent to suming all values.

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173003

Below is for BigQuery Standard SQL

#standardSQL
SELECT *,
  COUNTIF(New_Session_Flag = 1) OVER(PARTITION BY Fullvisitorid  ORDER BY Visitid) Rank_Session_Order
FROM `project.dataset.table`

Upvotes: 3

Related Questions