stkvtflw
stkvtflw

Reputation: 13587

BigQuery: Select value by last date per user

Sorry for the ambiguous title. Not sure how to name it.

user_id  | transaction_amount | date  
1        | 100                | 2019-12-15
1        | 250                | 2019-12-16 
2        | 300                | 2019-12-11
2        | 415                | 2019-12-12

I need to select latest transaction_amount per each user:

user_id  | transaction_amount | date  
1        | 250                | 2019-12-16 
2        | 415                | 2019-12-12

How do I do that?

Here is what I do now:

SELECT
  user_id,
  transaction_amount
FROM the_original_table VT1
  WHERE date = (
    SELECT
      MAX(date)
    FROM the_original_table VT2
    WHERE
      VT1.user_id = VT2.user_id
  )
GROUP BY
  user_id,
  date,
  transaction_amount

My approach looks incorrect, but it does the job. How do I do it better?

Upvotes: 4

Views: 11066

Answers (4)

Gordon Linoff
Gordon Linoff

Reputation: 1271141

In BigQuery, I would just use aggregation:

select array_agg(t order by date desc limit 1)[ordinal(1)].*
from the_original_table  t
group by user_id;

Since this question was asked, a better approach has been introduced using qualify:

select t.*
from the_original_table  t
where 1=1
qualify row_number() over (partition by user_id order by date desc) = 1;

Upvotes: 4

pmo511
pmo511

Reputation: 619

Use a cte with row_number instead of array_agg, easier to read an supported by most vendors:

with ctetbl as (
   select user_id, transaction_amount, date
         ,row_number() over (partition by user_id order by date desc) as rn
   from the_original_table
)
select user_id, transaction_amount, date
from ctetbl
where rn = 1

Upvotes: 4

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173190

Below is for BigQuery Standard SQL and very BigQuery'ish style

#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY `date` DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.the_original_table` t
GROUP BY user_id     

If to apply to sample data from your question as in example below

#standardSQL
WITH `project.dataset.the_original_table` AS (
  SELECT 1 user_id, 100 transaction_amount, '2019-12-15' `date` UNION ALL
  SELECT 1, 250, '2019-12-16' UNION ALL 
  SELECT 2, 300, '2019-12-11' UNION ALL
  SELECT 2, 415, '2019-12-12' 
)
SELECT AS VALUE ARRAY_AGG(t ORDER BY `date` DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.the_original_table` t
GROUP BY user_id     

result is

Row user_id transaction_amount  date     
1   1       250                 2019-12-16   
2   2       415                 2019-12-12   

Upvotes: 3

GMB
GMB

Reputation: 222702

Filtering with a correlated subquery is a good approach. You can drop the group by clause because it does not really make sense: since it is applying to all columns, it does nothing useful (apart from removing potential duplicates, which do not seem to occur here):

select t.*
from the_original_table t
where t.date = (
    select max(t1.date) from the_original_table t1 where t1.user_id = t.user_id
)

For performance, you want an index on (user_id, date).

Upvotes: 1

Related Questions