user1932483
user1932483

Reputation: 75

bigquery - left join on most recent record in the right table without going over timestamp of left table

So I have a table of purchases:
user_id
purchase_time
amount

And I have a table of the user activities on the website:
user_id
location
browse_time

How can I join the purchases table with the activities table to get the most recent browse_time activity without going over the purchase_time?

For example if I have for Purchase Table:

user_id     Purchase_time          amount
-------     -------------------    ------
1           2012-12-13 12:30:00    $20
2           2012-12-14 23:00:00    $50

And I have the Activities Table:

user_id     browse_time            location
-------     -----------            ---------
1           2012-12-14 23:00:00    Product 3
1           2012-12-13 12:00:00    Product 1
1           2012-12-13 11:30:00    Product 2
2           2012-12-15 00:00:00    Product 5
2           2012-12-14 22:30:00    Product 7
2           2012-12-14 20:00:00    Product 6

I would like to have the following output:

user_id    purchase_time           browse_time           location     amount
-------    -----------------       ---------------       ----------   -------
1           2012-12-13 12:30:00    2012-12-13 12:00:00   Product 1    $20
2           2012-12-14 23:00:00    2012-12-14 22:30:00   Product 7    $50   

I tried mysql syntax and it didn't work. I know that bigquery does not allow "<" or ">" for join statements on "on". So is that even possible to do it in Big Query?

Upvotes: 3

Views: 2748

Answers (1)

Balazs Gunics
Balazs Gunics

Reputation: 2067

So if you do something like this, you'll get a lot more result than you want:

SELECT
    user_id
    purchase_time
    browse_time
    location
    amount
FROM
    purchases pur
JOIN
    user_activities uav
ON
    pur.user_id = uav.user_id

You want the MOST RECENT of user_activities, so lets make a subquery of the right side of the join:

SELECT
    user_id
    purchase_time
    browse_time
    location
    amount
FROM
    (SELECT 
         user_id AS user_id 
         location AS location
         MAX(browse_time) AS browse_time 
     FROM
         purchases 
     GROUP BY user_id,location) pur
JOIN
    user_activities uav
ON
    pur.user_id = uav.user_id

I hope this will helps you solving the problem.

Upvotes: 1

Related Questions