ljuk
ljuk

Reputation: 784

get latest row for each user_id in BigQuery based on two conditions

I have a Bigquery Table with 3 important columns:

user_id, created_at, updated_at.

Each user_id can have several records, for example:

+----------+------------+-------------+-----------------+
| user_id  | created_at | updated_at  |    task_name    |
+----------+------------+-------------+-----------------+
| user_1   | 2020-01-01 | 2020-01-02  | some_task_name1 |
| user_1   | 2020-01-01 | 2020-01-03  | some_task_name1 |
| user_1   | 2020-03-01 | 2020-03-01  | some_task_name1 |
| user_2   | 2020-01-01 | 2020-01-02  | some_task_name2 |
+----------+------------+-------------+-----------------+

For each user_id with the same created_at value, I want to select the record with the latest updated_at. So for example, in this example the output has to be:

    +----------+------------+-------------+-----------------+
    | user_id  | created_at | updated_at  |    task_name    |
    +----------+------------+-------------+-----------------+
    | user_1   | 2020-01-01 | 2020-01-03  | some_task_name1 |
    | user_1   | 2020-03-01 | 2020-03-01  | some_task_name1 |
    | user_2   | 2020-01-01 | 2020-01-02  | some_task_name2 |
    +----------+------------+-------------+-----------------+

I was trying this, but I don't know how to add the condition for the same created_at:

SELECT
  agg.table.*
FROM (
  SELECT
    user_id,
    ARRAY_AGG(STRUCT(table)
    ORDER BY
      updated_at DESC)[SAFE_OFFSET(0)] agg
  FROM
    `dataset.my_table` table
  GROUP BY
    user_id)

Upvotes: 0

Views: 204

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

Consider below approach

select as value array_agg(t order by updated_at desc limit 1)[offset(0)]
from `project.dataset.table` t
group by user_id, created_at        

If applied to sample data in your question - output is

enter image description here

Upvotes: 1

Related Questions