Reputation: 784
I have a Bigquery Table with 3 important columns:
user_id
, created_at
, updated_at
.
Each user_id
can have several records, for example:
+----------+------------+-------------+-----------------+
| user_id | created_at | updated_at | task_name |
+----------+------------+-------------+-----------------+
| user_1 | 2020-01-01 | 2020-01-02 | some_task_name1 |
| user_1 | 2020-01-01 | 2020-01-03 | some_task_name1 |
| user_1 | 2020-03-01 | 2020-03-01 | some_task_name1 |
| user_2 | 2020-01-01 | 2020-01-02 | some_task_name2 |
+----------+------------+-------------+-----------------+
For each user_id
with the same created_at
value, I want to select the record with the latest updated_at
. So for example, in this example the output has to be:
+----------+------------+-------------+-----------------+
| user_id | created_at | updated_at | task_name |
+----------+------------+-------------+-----------------+
| user_1 | 2020-01-01 | 2020-01-03 | some_task_name1 |
| user_1 | 2020-03-01 | 2020-03-01 | some_task_name1 |
| user_2 | 2020-01-01 | 2020-01-02 | some_task_name2 |
+----------+------------+-------------+-----------------+
I was trying this, but I don't know how to add the condition for the same created_at
:
SELECT
agg.table.*
FROM (
SELECT
user_id,
ARRAY_AGG(STRUCT(table)
ORDER BY
updated_at DESC)[SAFE_OFFSET(0)] agg
FROM
`dataset.my_table` table
GROUP BY
user_id)
Upvotes: 0
Views: 204
Reputation: 172994
Consider below approach
select as value array_agg(t order by updated_at desc limit 1)[offset(0)]
from `project.dataset.table` t
group by user_id, created_at
If applied to sample data in your question - output is
Upvotes: 1