roboes
roboes

Reputation: 401

Join sum to closest timestamp once up to interval cap

I am trying to join a site_interactions table with a store_transactions table. For this, I want that the store_transactions.sales_amount for a given username gets attached to the closest site_interactions.timestamp match, at most one time and up to 7 days of the site_interactions.timestamp variable.

site_interaction table:

username    timestamp
John        01.01.2020 15:00:00
John        02.01.2020 11:30:00
Sarah       03.01.2020 12:00:00

store_transactions table:

username    timestamp             sales_amount
John        02.01.2020 16:00:00   45
John        03.01.2020 16:00:00   70
John        09.01.2020 16:00:00   15
Sarah       02.01.2020 09:00:00   35
Tim         02.01.2020 10:00:00   60

Desired output:

 username   timestamp             sales_amount
 John       01.01.2020 15:00:00   NULL
 John       02.01.2020 11:30:00   115
 Sarah      03.01.2020 12:00:00   NULL

Explanation:

John has 3 entries/transactions in the store_transactions table. The first and the second purchase were realized within the 7 days interval/limit, and the sum of these two transactions (45 + 70 = 115) were attached/joined to the closest and nearest match only once - i.e. to John's second interaction (timestamp = 02.01.2020 11:30:00). John's third transactions was not attached to any site interaction, because it exceeds the 7 days interval (including the time).

Sarah has one transaction realized before her interaction with the site. Thus her sales_amount of 35 was not attached to the site_interaction table.

Last, Tim's transaction was not attached anywhere - because this username does not show in the site_interaction table.

Here a link of the tables: https://rextester.com/RKSUK73038

Thanks in advance!

Upvotes: 0

Views: 33

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172954

Below is for BigQuery Standard SQL

#standardSQL
select i.username, i.timestamp, 
  sum(sales_amount) as sales_amount
from (
  select username, timestamp, 
    ifnull(lead(timestamp) over(partition by username order by timestamp), timestamp_add(timestamp, interval 7 day)) next_timestamp
  from `project.dataset.site_interaction`
) i
left join `project.dataset.store_transactions` t 
on i.username = t.username
and t.timestamp >= i.timestamp 
and t.timestamp < least(next_timestamp, timestamp_add(i.timestamp, interval 7 day))
group by username, timestamp

if to apply to sample data from your question - output is

enter image description here

Upvotes: 1

Related Questions