user3667896
user3667896

Reputation: 13

BigQuery most recent events per user in a logfile

This must be simple but it's got me stumped in BQ

Simple log file:

mysql> select * from wow;
+------+---------------+------------+
| id   | string        | epoch      |
+------+---------------+------------+
| id3  | a new user    | 1400783480 |
| id1  | a log event 1 | 1400783492 |
| id1  | a log event 2 | 1400783499 |  
| id1  | a log event 3 | 1400783503 |
| id2  | a log event 1 | 1400783510 |
| id2  | a log event 2 | 1400783516 |
+------+---------------+------------+

I want to do the equivalent in BQ to find the last event for each Id:

mysql> select * from (select * from wow as B order by epoch desc) as A group by id;
+------+---------------+------------+
| id   | string        | epoch      |
+------+---------------+------------+
| id1  | a log event 3 | 1400783503 |
| id2  | a log event 2 | 1400783516 |
| id3  | a new user    | 1400783480 |
+------+---------------+------------+
3 rows in set (0.00 sec)

Any tips/helps much appreciated.

Upvotes: 1

Views: 921

Answers (2)

N.N.
N.N.

Reputation: 3172

I'd rather avoid Joins, as there are more elegant ways to do this:

  1. If your data set is not too big, use rownumber():

    Select * from (Select *,row_number() over (partition by id order by epoc desc) as RNB from t) where RNB=1

  2. for larger data sets, you can manipulate the max(time) and the log string to fid the last log entry per user.

    Select id, substring(max(concat(string(epoch),logstring)),10) from t group each by id

Hope this helps.

Upvotes: 2

Craig Citro
Craig Citro

Reputation: 6625

The simplest way is probably to do a query with a GROUP BY and MIN to find the (id, epoch) pairs you want, and then join it back to the original table:

SELECT t.*
FROM 
    (SELECT id, MAX(epoch) AS max_epoch FROM [tmp.so1] GROUP BY id) AS keys 
  JOIN 
    [tmp.so1] AS t 
  ON keys.id = t.id AND keys.max_epoch = t.epoch 
ORDER BY t.id

For your data, this indeed gives back:

+------+---------------+------------+
| t_id |     t_msg     |  t_epoch   |
+------+---------------+------------+
| id1  | a log event 3 | 1400783503 |
| id2  | a log event 2 | 1400783516 |
| id3  | a new user    | 1400783480 |
+------+---------------+------------+

Three notes:

  • I used the handy t.* shorthand for the results, but you can actually map them through if you want the original names
  • If your table grows large, you'll probably want to switch to a JOIN EACH.
  • In the case that you have multiple entries with the minimal epoch for a given ID, you'll get doubled rows.

Upvotes: 2

Related Questions