Reputation: 647
I wonder if anyone here can help with a BigQuery piece I am working on.
This will need to pull the most recent gplus/currents activity for each individual user in the domain. I have tried the following query, but this pulls every activity for every user:
SELECT
TIMESTAMP_MICROS(time_usec) AS date,
email,
event_type,
event_name
FROM
`bqadminreporting.adminlogtracking.activity`
WHERE
record_type LIKE 'gplus'
ORDER BY
email ASC;
I have tried to use DISTINCT, but I still get multiple entries for the same user. Ideally, I need to do this looking back over 90 day... (So between today and 90 days ago, get the most recent activity for each user - if that makes sense?) which brings me to the issue with another question.
EDIT: Example data and expected output.
Fields: There are over 500 fields, I have just listed the relevant ones
+--------------------------------+---------+----------+
| Field name | Type | Mode |
+--------------------------------+---------+----------+
| time_usec | INTEGER | NULLABLE |
| email | STRING | NULLABLE |
| event_type | STRING | NULLABLE |
| event_name | STRING | NULLABLE |
| record_type | STRING | NULLABLE |
| gplus | RECORD | NULLABLE |
| gplus. log_event_resource_name | STRING | NULLABLE |
| gplus. attachment_type | STRING | NULLABLE |
| gplus. plusone_context | STRING | NULLABLE |
| gplus. post_permalink | STRING | NULLABLE |
| gplus. post_resource_name | STRING | NULLABLE |
| gplus. comment_resource_name | STRING | NULLABLE |
| gplus. post_visibility | STRING | NULLABLE |
| gplus. user_type | STRING | NULLABLE |
| gplus. post_author_name | STRING | NULLABLE |
+--------------------------------+---------+----------+
Output from my query: This is the output I get when running my query above.
+-----+--------------------------------+------------------+----------------+----------------+
| Row | date | email | event_type | event_name |
+-----+--------------------------------+------------------+----------------+----------------+
| 1 | 2020-01-30 07:10:19.088 UTC | [email protected] | post_change | create_post |
| 2 | 2020-03-03 08:47:25.086485 UTC | [email protected] | coment_change | create_comment |
| 3 | 2020-03-23 09:10:09.522 UTC | [email protected] | post_change | create_post |
| 4 | 2020-03-23 09:49:00.337 UTC | [email protected] | plusone_change | remove_plusone |
| 5 | 2020-03-23 09:48:10.461 UTC | [email protected] | plusone_change | add_plusone |
| 6 | 2020-01-30 10:04:29.757005 UTC | [email protected] | coment_change | create_comment |
| 7 | 2020-03-28 08:52:50.711359 UTC | [email protected] | coment_change | create_comment |
| 8 | 2020-11-08 10:08:09.161325 UTC | [email protected] | coment_change | create_comment |
| 9 | 2020-04-21 15:28:10.022683 UTC | [email protected] | coment_change | create_comment |
| 10 | 2020-03-28 09:37:28.738863 UTC | [email protected] | coment_change | create_comment |
+-----+--------------------------------+------------------+----------------+----------------+
Desired result: Only 1 row of data per user, showing only the most recent event.
+-----+--------------------------------+------------------+----------------+----------------+
| Row | date | email | event_type | event_name |
+-----+--------------------------------+------------------+----------------+----------------+
| 1 | 2020-03-23 09:49:00.337 UTC | [email protected] | plusone_change | remove_plusone |
| 2 | 2020-11-08 10:08:09.161325 UTC | [email protected] | coment_change | create_comment |
| 3 | 2020-04-21 15:28:10.022683 UTC | [email protected] | coment_change | create_comment |
| 4 | 2020-03-28 09:37:28.738863 UTC | [email protected] | coment_change | create_comment |
+-----+--------------------------------+------------------+----------------+----------------+
Upvotes: 1
Views: 3800
Reputation: 10162
Use array_agg:
select
email,
array_agg(STRUCT(TIMESTAMP_MICROS(time_usec) as date, event_type, event_name) ORDER BY time_usec desc LIMIT 1)[OFFSET(0)].*
from `bqadminreporting.adminlogtracking.activity`
where
record_type LIKE 'gplus'
and time_usec > unix_micros(timestamp_sub(current_timestamp(), interval 90 day))
group by email
order by email
Test example:
with mytable as (
select timestamp '2020-01-30 07:10:19.088 UTC' as date, '[email protected]' as email, 'post_change' as event_type, 'create_post' as event_name union all
select timestamp '2020-03-03 08:47:25.086485 UTC', '[email protected]', 'coment_change', 'create_comment' union all
select timestamp '2020-03-23 09:10:09.522 UTC', '[email protected]', 'post_change', 'create_post' union all
select timestamp '2020-03-23 09:49:00.337 UTC', '[email protected]', 'plusone_change', 'remove_plusone' union all
select timestamp '2020-03-23 09:48:10.461 UTC', '[email protected]', 'plusone_change', 'add_plusone' union all
select timestamp '2020-01-30 10:04:29.757005 UTC', '[email protected]', 'coment_change', 'create_coment' union all
select timestamp '2020-03-28 08:52:50.711359 UTC', '[email protected]', 'coment_change', 'create_coment' union all
select timestamp '2020-11-08 10:08:09.161325 UTC', '[email protected]', 'coment_change', 'create_coment' union all
select timestamp '2020-04-21 15:28:10.022683 UTC', '[email protected]', 'coment_change', 'create_coment' union all
select timestamp '2020-03-28 09:37:28.738863 UTC', '[email protected]', 'coment_change', 'create_coment'
)
select
email,
array_agg(STRUCT(date, event_type, event_name) ORDER BY date desc LIMIT 1)[OFFSET(0)].*
from mytable
group by email
Upvotes: 1
Reputation: 1269693
If you want all columns from the most recent row, you can use this BigQuery syntax:
select array_agg(t order by date desc limit 1)[ordinal(1)].*
from mytable t
group by t.email;
If you want specific columns, then Sergey's solution might be simpler.
Upvotes: 1
Reputation: 699
An alternative way to solve your problem is :-
select * from (
select
max (date1) max_dt
from mytable
group by date(date1)), mytable
where date1=max_dt
Upvotes: 0