Reputation: 327
In BigQuery, how can I get rows based on the latest value of timestamp field?
For example, I have this table.
first_name | last_name | use_auto | login_at |
---|---|---|---|
James | Davis | true | 2021-05-13 02:00:00 UTC |
James | Moore | true | 2021-05-13 02:00:01 UTC |
James | Green | true | 2021-05-13 02:00:02 UTC |
Edward | Green | false | 2021-05-13 03:00:00 UTC |
Edward | Wilson | false | 2021-05-13 03:00:01 UTC |
James | Davis | false | 2021-05-13 03:00:00 UTC |
James | Moore | false | 2021-05-13 03:00:01 UTC |
James | Green | false | 2021-05-13 03:00:02 UTC |
Edward | Green | true | 2021-05-13 02:00:00 UTC |
Edward | Wilson | true | 2021-05-13 02:00:00 UTC |
I'd like to get the result after query like this,
first_name | last_name | use_auto | login_at |
---|---|---|---|
Edward | Green | false | 2021-05-13 03:00:00 UTC |
Edward | Wilson | false | 2021-05-13 03:00:01 UTC |
James | Davis | false | 2021-05-13 03:00:00 UTC |
James | Moore | false | 2021-05-13 03:00:01 UTC |
James | Green | false | 2021-05-13 03:00:02 UTC |
Please let me know what query should I use.
Upvotes: 1
Views: 1765
Reputation: 26818
Not sure why both solution below doesn't work for me, I end up using window function like this:
SELECT first_name, last_name, use_auto, login_at
FROM (
SELECT first_name, last_name, use_auto, login_at
ROW_NUMBER() OVER (PARTITION BY first_name, last_name ORDER BY login_at DESC) AS rn
FROM thistable
) t
WHERE rn = 1
or CTE variant:
WITH t AS (
SELECT first_name, last_name, use_auto, login_at
ROW_NUMBER() OVER (PARTITION BY first_name, last_name ORDER BY login_at DESC) AS rn
FROM thistable
)
SELECT first_name, last_name, use_auto, login_at
FROM t
WHERE rn = 1
the logic are:
this also works if you need 2 latest row of each group, top 3 of group or something similar
NOTE: normally in other database (NOT BIGQUERY) that support multiple column matching (PostgreSQL for example), I would do something like this (don't forget to create index on for 3 group columns):
SELECT *
FROM thistable
WHERE (first_name, last_name, login_at) IN (
SELECT first_name, last_name, MAX(login_at)
FROM thistable
GROUP BY 1, 2
)
Upvotes: 0
Reputation: 10152
Following up on the Gordon's answer: add limit 1
to reduce memory consumption and increase scalability:
select array_agg(t order by login_at desc limit 1)[ordinal(1)].*
from thistable t
group by t.first_name, t.last_name;
Upvotes: 0
Reputation: 1269443
A convenient way in BigQuery uses aggregation:
select array_agg(t order by login_at desc)[ordinal(1)].*
from thistable t
group by first_name, last_name;
Upvotes: 1