Reputation: 75
So I have a table of purchases:
user_id
purchase_time
amount
And I have a table of the user activities on the website:
user_id
location
browse_time
How can I join the purchases table with the activities table to get the most recent browse_time activity without going over the purchase_time?
For example if I have for Purchase Table:
user_id Purchase_time amount
------- ------------------- ------
1 2012-12-13 12:30:00 $20
2 2012-12-14 23:00:00 $50
And I have the Activities Table:
user_id browse_time location
------- ----------- ---------
1 2012-12-14 23:00:00 Product 3
1 2012-12-13 12:00:00 Product 1
1 2012-12-13 11:30:00 Product 2
2 2012-12-15 00:00:00 Product 5
2 2012-12-14 22:30:00 Product 7
2 2012-12-14 20:00:00 Product 6
I would like to have the following output:
user_id purchase_time browse_time location amount
------- ----------------- --------------- ---------- -------
1 2012-12-13 12:30:00 2012-12-13 12:00:00 Product 1 $20
2 2012-12-14 23:00:00 2012-12-14 22:30:00 Product 7 $50
I tried mysql syntax and it didn't work. I know that bigquery does not allow "<" or ">" for join statements on "on". So is that even possible to do it in Big Query?
Upvotes: 3
Views: 2748
Reputation: 2067
So if you do something like this, you'll get a lot more result than you want:
SELECT
user_id
purchase_time
browse_time
location
amount
FROM
purchases pur
JOIN
user_activities uav
ON
pur.user_id = uav.user_id
You want the MOST RECENT of user_activities, so lets make a subquery of the right side of the join:
SELECT
user_id
purchase_time
browse_time
location
amount
FROM
(SELECT
user_id AS user_id
location AS location
MAX(browse_time) AS browse_time
FROM
purchases
GROUP BY user_id,location) pur
JOIN
user_activities uav
ON
pur.user_id = uav.user_id
I hope this will helps you solving the problem.
Upvotes: 1