Reputation: 13587
Sorry for the ambiguous title. Not sure how to name it.
user_id | transaction_amount | date
1 | 100 | 2019-12-15
1 | 250 | 2019-12-16
2 | 300 | 2019-12-11
2 | 415 | 2019-12-12
I need to select latest transaction_amount per each user:
user_id | transaction_amount | date
1 | 250 | 2019-12-16
2 | 415 | 2019-12-12
How do I do that?
Here is what I do now:
SELECT
user_id,
transaction_amount
FROM the_original_table VT1
WHERE date = (
SELECT
MAX(date)
FROM the_original_table VT2
WHERE
VT1.user_id = VT2.user_id
)
GROUP BY
user_id,
date,
transaction_amount
My approach looks incorrect, but it does the job. How do I do it better?
Upvotes: 4
Views: 11066
Reputation: 1271141
In BigQuery, I would just use aggregation:
select array_agg(t order by date desc limit 1)[ordinal(1)].*
from the_original_table t
group by user_id;
Since this question was asked, a better approach has been introduced using qualify
:
select t.*
from the_original_table t
where 1=1
qualify row_number() over (partition by user_id order by date desc) = 1;
Upvotes: 4
Reputation: 619
Use a cte with row_number instead of array_agg, easier to read an supported by most vendors:
with ctetbl as (
select user_id, transaction_amount, date
,row_number() over (partition by user_id order by date desc) as rn
from the_original_table
)
select user_id, transaction_amount, date
from ctetbl
where rn = 1
Upvotes: 4
Reputation: 173190
Below is for BigQuery Standard SQL and very BigQuery'ish style
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY `date` DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.the_original_table` t
GROUP BY user_id
If to apply to sample data from your question as in example below
#standardSQL
WITH `project.dataset.the_original_table` AS (
SELECT 1 user_id, 100 transaction_amount, '2019-12-15' `date` UNION ALL
SELECT 1, 250, '2019-12-16' UNION ALL
SELECT 2, 300, '2019-12-11' UNION ALL
SELECT 2, 415, '2019-12-12'
)
SELECT AS VALUE ARRAY_AGG(t ORDER BY `date` DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.the_original_table` t
GROUP BY user_id
result is
Row user_id transaction_amount date
1 1 250 2019-12-16
2 2 415 2019-12-12
Upvotes: 3
Reputation: 222702
Filtering with a correlated subquery is a good approach. You can drop the group by
clause because it does not really make sense: since it is applying to all columns, it does nothing useful (apart from removing potential duplicates, which do not seem to occur here):
select t.*
from the_original_table t
where t.date = (
select max(t1.date) from the_original_table t1 where t1.user_id = t.user_id
)
For performance, you want an index on (user_id, date)
.
Upvotes: 1