athew
athew

Reputation: 55

BigQuery all time uniqueness by rows

I have a very similar question what I already had last weeks ago: Get all time unique values in BigQuery

I have a database like this:

ID Day Value
1 2021-09-01 a
2 2021-09-01 b
3 2021-09-01 c
4 2021-09-02 d
5 2021-09-02 a
6 2021-09-02 a
7 2021-09-02 e
8 2021-09-03 c
9 2021-09-03 f
10 2021-09-03 a

I'd like to count how many different rows I have daily and all time, but the all time uniqueness should count only with the date before (the business logic behind that I'd like to count if the user is new). The difference with the question before that I'd like to leave the rows but I'd like to see the uniqueness by rows (as a new column). It's almost the same what we have on Google Analytics as a new or returning user. So if a user get the site on 2021-09-02 and get to the site on 2021-09-03, first I'd like to see as New user but on 2021-09-03 I'd like to see a Returning user. so I'd like to see this output

ID Day Value Type
1 2021-09-01 a New
2 2021-09-01 b New
3 2021-09-01 c New
4 2021-09-02 d New
5 2021-09-02 a Returning
6 2021-09-02 a Returning
7 2021-09-02 e Returning
8 2021-09-03 c New
9 2021-09-03 f New
10 2021-09-03 a Returning

I can do it if I only check it on one day but I can't do it if I check these on the whole database because of the date before checking.

Upvotes: 0

Views: 95

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

Consider also below approach

select *, if(0 = count(*) over prev_days, 'New', 'Returning') as type
from your_table
window prev_days as (
  partition by value order by unix_date(date(day)) 
  range between unbounded preceding and 1 preceding 
)

Upvotes: 0

Cleanbeans
Cleanbeans

Reputation: 676

So it looks you want to use an analytics functions detailed in this doc

Using the analytics functions OVER with PARTITION BY you can partition your data via value then order it by date using ORDER BY. Now check to see if its the first row in that partition and assign the type accordingly.

This query should get you what you want;

WITH data as(
    SELECT "2021-09-01" day,"a" value
    UNION ALL ( SELECT "2021-09-01", "b" )
    UNION ALL ( SELECT "2021-09-01", "c" )
    UNION ALL ( SELECT "2021-09-02", "d" )
    UNION ALL ( SELECT "2021-09-02", "a" )
    UNION ALL ( SELECT "2021-09-02", "a" )
    UNION ALL ( SELECT "2021-09-02", "e" )
    UNION ALL ( SELECT "2021-09-03", "c" )
    UNION ALL ( SELECT "2021-09-03", "f" )
    UNION ALL ( SELECT "2021-09-03", "a" )
    )
    
    SELECT day, value,
      IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1, 'New','Returning') as type
    
    FROM data

Results

Row day value type
1 2021-09-01 a New
2 2021-09-02 a Returning
3 2021-09-02 a Returning
4 2021-09-03 a Returning
5 2021-09-01 b New
6 2021-09-01 c New
7 2021-09-03 c Returning
8 2021-09-02 d New
9 2021-09-02 e New
10 2021-09-03 f New

Amended with additional requirement

To give all grouped values with the same date the as the first event the New type you can use another analytics functions FIRST_VALUE and combine against the current date value.

WITH data as
(SELECT "2021-09-01" day,"a" value
UNION ALL ( SELECT "2021-09-01","b")
UNION ALL ( SELECT "2021-09-01","c")
UNION ALL ( SELECT "2021-09-02","d")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-01","a")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-02","e")
UNION ALL ( SELECT "2021-09-03","c")
UNION ALL ( SELECT "2021-09-03","f")
UNION ALL ( SELECT"2021-09-03","a"))

SELECT *,
IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1 OR FIRST_VALUE(day) OVER (PARTITION BY value ORDER BY day) = day, 'New','Returning')  as type
FROM data

Result

Row day value type
1 2021-09-01 a New
2 2021-09-01 a New
3 2021-09-02 a Returning
4 2021-09-02 a Returning
5 2021-09-03 a Returning
6 2021-09-01 b New
7 2021-09-01 c New
8 2021-09-03 c Returning
9 2021-09-02 d New
10 2021-09-02 e New
11 2021-09-03 f New

Upvotes: 1

Related Questions