Sharan
Sharan

Reputation: 27

How to Find Last Change row in SQL - Big Query

Can someone please provide a query that I can use in Google Big Query to identify the total count of users for whom the value changed specifically from 'C' to 'P'? In the below table userid=123 satisfies this even though later userid = 123 changes back from 'P' to 'C'.

userid    timestamp            Value
123       9-15-2020 02:35:45   C
456       9-15-2020 01:45:09   P
789       9-15-2020 06:22:10   P
123       9-15-2020 03:43:00   P
456       9-15-2020 03:45:10   C
123       9-15-2020 07:40:34   C

Upvotes: 1

Views: 1492

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

identify the total count of users for whom the value changed specifically from 'C' to 'P'?

Below is for BigQuery Standard SQL

#standardSQL
SELECT COUNT(DISTINCT userid) AS qualified_users
FROM `project.dataset.table` 
GROUP BY userid
HAVING STRPOS(STRING_AGG(value, '' ORDER BY timestamp), 'CP') > 0    

Note; I assume your timestamp column is of TIMESTAMP data type - otherwise you will need to use PARSE_TIMESTAMP in ORDER BY portion

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269533

Can someone please provide a query that I can use in Google Big Query to identify the total count of users for whom the value changed specifically from 'C' to 'P'

Note that this is not consistent with the title of the question.

lag() is the key idea. But it is unclear whether you want the count of users or the count of changes. This calculates both:

select count(*) as num_changes,
       count(distinct userid) as num_users_with_change
from (select t.*,
             lag(value) over(partition by userid order by timestamp) as prev_value
      from tablename t
     ) t
where value = 'P' and prev_value = 'C';

The second column counts a user only once, regardless of the number of times they have changed (which is my interpretation of your question).

Upvotes: 0

Fahmi
Fahmi

Reputation: 37473

You can try using lag()

select userid from
(
select userid, timestamp, value, lag(value) over(partition by userid order by timestamp) as prev_value
from tablename
)A where value='P' and prev_value='C'

Upvotes: 2

Related Questions