Reputation: 127
I have a table that looks like the below:
Row | Fullvisitorid | Visitid | New_Session_Flag
1 | A | 111 | 1
2 | A | 120 | 0
3 | A | 128 | 0
4 | A | 133 | 0
5 | A | 745 | 1
6 | A | 777 | 0
7 | B | 388 | 1
8 | B | 401 | 0
9 | B | 420 | 0
10 | B | 777 | 1
11 | B | 784 | 0
12 | B | 791 | 0
13 | B | 900 | 1
14 | B | 904 | 0
What I want to do is if it's the first row for a fullvisitorid then mark the field as 1, otherwise use the above row as the value, but if the new_session_flag = 1 then use the above row plus 1, example of output I'm looking for below:
Row | Fullvisitorid | Visitid | New_Session_Flag | Rank_Session_Order
1 | A | 111 | 1 | 1
2 | A | 120 | 0 | 1
3 | A | 128 | 0 | 1
4 | A | 133 | 0 | 1
5 | A | 745 | 1 | 2
6 | A | 777 | 0 | 2
7 | B | 388 | 1 | 1
8 | B | 401 | 0 | 1
9 | B | 420 | 0 | 1
10 | B | 777 | 1 | 2
11 | B | 784 | 0 | 2
12 | B | 791 | 0 | 2
13 | B | 900 | 1 | 3
14 | B | 904 | 0 | 3
As you can see:
Row 1 is 1 because it's the first time fullvisitorid A appears
Row 2 is 1 because it's not the first time fullvisitorid A appears and new_session_flag <> 1 therefore it uses the above row (i.e. 1)
Row 5 is 2 because it's not the first time fullvisitorid A appears and new_session_Flag = 1 therefore it uses the above row (i.e 1) plus 1
Row 7 is 1 because it's the first time fullvisitorid B appears
etc.
I believe this can be done through a retain
statement in SAS but is there an equivalent in Google BigQquery?
Hopefully the above makes sense, let me know if not.
Thanks in advance
Upvotes: 2
Views: 609
Reputation: 222482
The answer by Mikhail Berlyant using a conditional window count is corret and works. I am answering because I find that a window sum is even simpler (and possibly more efficient on a large dataset):
select
t.*,
sum(new_session_flag) over(partition by fullvisitorid order by visid_id) rank_session_order
from mytable t
This works because the new_session_flag
contains 0
s and 1
s only; so counting the 1
s is actually equivalent to suming all values.
Upvotes: 0
Reputation: 173003
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
COUNTIF(New_Session_Flag = 1) OVER(PARTITION BY Fullvisitorid ORDER BY Visitid) Rank_Session_Order
FROM `project.dataset.table`
Upvotes: 3