Chaouki
Chaouki

Reputation: 465

sql query using time series

I have the below table in bigquery:

 Timestamp        variant_id        activity
2020-04-02 08:50    1               active
2020-04-03 07:39    1               not_active
2020-04-04 07:40    1               active
2020-04-05 10:22    2               active
2020-04-07 07:59    2               not_active

I want to query this subset of data to get the number of active variant per day.

If variant_id 1 is active at date 2020-04-04, it still active the follwing dates also 2020-04-05, 2020-04-06 until the value activity column is not_active , the goal is to count each day the number of variant_id who has the value active in the column activity, but I should take into account that each variant_id has the value of the last activity on a specific date.

for example the result of the desired query in the subset data must be:

Date       activity_count
2020-04-02  1
2020-04-03  0
2020-04-04  1
2020-04-05  2
2020-04-06  2
2020-04-07  1
2020-04-08  1
2020-04-09  1
2020-04-10  1

any help please ?

Upvotes: 0

Views: 176

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

Consider below approach

select date, count(distinct if(activity = 'active', variant_id, null)) activity_count
from (
  select date(timestamp) date, variant_id, activity,
    lead(date(timestamp)) over(partition by variant_id order by timestamp) next_date
  from your_table
  ), unnest(generate_date_array(date, ifnull(next_date - 1, '2020-04-10'))) date 
group by date      

if applied to sample data in your question - output is

enter image description here

Upvotes: 2

Related Questions