Reputation: 27
Can someone please provide a query that I can use in Google Big Query to identify the total count of users for whom the value changed specifically from 'C' to 'P'? In the below table userid=123 satisfies this even though later userid = 123 changes back from 'P' to 'C'.
userid timestamp Value
123 9-15-2020 02:35:45 C
456 9-15-2020 01:45:09 P
789 9-15-2020 06:22:10 P
123 9-15-2020 03:43:00 P
456 9-15-2020 03:45:10 C
123 9-15-2020 07:40:34 C
Upvotes: 1
Views: 1492
Reputation: 172974
identify the total count of users for whom the value changed specifically from 'C' to 'P'?
Below is for BigQuery Standard SQL
#standardSQL
SELECT COUNT(DISTINCT userid) AS qualified_users
FROM `project.dataset.table`
GROUP BY userid
HAVING STRPOS(STRING_AGG(value, '' ORDER BY timestamp), 'CP') > 0
Note; I assume your timestamp column is of TIMESTAMP data type - otherwise you will need to use PARSE_TIMESTAMP in ORDER BY portion
Upvotes: 0
Reputation: 1269533
Can someone please provide a query that I can use in Google Big Query to identify the total count of users for whom the value changed specifically from 'C' to 'P'
Note that this is not consistent with the title of the question.
lag()
is the key idea. But it is unclear whether you want the count of users or the count of changes. This calculates both:
select count(*) as num_changes,
count(distinct userid) as num_users_with_change
from (select t.*,
lag(value) over(partition by userid order by timestamp) as prev_value
from tablename t
) t
where value = 'P' and prev_value = 'C';
The second column counts a user only once, regardless of the number of times they have changed (which is my interpretation of your question).
Upvotes: 0
Reputation: 37473
You can try using lag()
select userid from
(
select userid, timestamp, value, lag(value) over(partition by userid order by timestamp) as prev_value
from tablename
)A where value='P' and prev_value='C'
Upvotes: 2