HG K
HG K

Reputation: 327

How to get the latest rows in BigQuery

In BigQuery, how can I get rows based on the latest value of timestamp field?

For example, I have this table.

first_name last_name use_auto login_at
James Davis true 2021-05-13 02:00:00 UTC
James Moore true 2021-05-13 02:00:01 UTC
James Green true 2021-05-13 02:00:02 UTC
Edward Green false 2021-05-13 03:00:00 UTC
Edward Wilson false 2021-05-13 03:00:01 UTC
James Davis false 2021-05-13 03:00:00 UTC
James Moore false 2021-05-13 03:00:01 UTC
James Green false 2021-05-13 03:00:02 UTC
Edward Green true 2021-05-13 02:00:00 UTC
Edward Wilson true 2021-05-13 02:00:00 UTC

I'd like to get the result after query like this,

first_name last_name use_auto login_at
Edward Green false 2021-05-13 03:00:00 UTC
Edward Wilson false 2021-05-13 03:00:01 UTC
James Davis false 2021-05-13 03:00:00 UTC
James Moore false 2021-05-13 03:00:01 UTC
James Green false 2021-05-13 03:00:02 UTC

Please let me know what query should I use.

Upvotes: 1

Views: 1765

Answers (3)

Kokizzu
Kokizzu

Reputation: 26818

Not sure why both solution below doesn't work for me, I end up using window function like this:

SELECT first_name,  last_name,  use_auto, login_at  
FROM (
  SELECT first_name, last_name, use_auto, login_at
  ROW_NUMBER() OVER (PARTITION BY first_name, last_name ORDER BY login_at DESC) AS rn
  FROM thistable
) t
WHERE rn = 1

or CTE variant:

WITH t AS (
  SELECT first_name, last_name, use_auto, login_at
  ROW_NUMBER() OVER (PARTITION BY first_name, last_name ORDER BY login_at DESC) AS rn
  FROM thistable
)
SELECT first_name,  last_name,  use_auto, login_at  
FROM t
WHERE rn = 1

the logic are:

  1. create row number, partition by one that we want to group, in this case: first_name and last_name
  2. filter only first row number

this also works if you need 2 latest row of each group, top 3 of group or something similar

NOTE: normally in other database (NOT BIGQUERY) that support multiple column matching (PostgreSQL for example), I would do something like this (don't forget to create index on for 3 group columns):

SELECT *
FROM thistable
WHERE (first_name, last_name, login_at) IN (
  SELECT first_name, last_name, MAX(login_at)
  FROM thistable
  GROUP BY 1, 2
)

Upvotes: 0

Sergey Geron
Sergey Geron

Reputation: 10152

Following up on the Gordon's answer: add limit 1 to reduce memory consumption and increase scalability:

select array_agg(t order by login_at desc limit 1)[ordinal(1)].*
from thistable t
group by t.first_name, t.last_name;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269443

A convenient way in BigQuery uses aggregation:

select array_agg(t order by login_at desc)[ordinal(1)].*
from thistable t
group by first_name, last_name;

Upvotes: 1

Related Questions