Zzema
Zzema

Reputation: 307

How to join two tables using partition (or something else)

I have two tables:

And I want to have sales with current categories of products to this day (if category changed today i want to see it to this day and later, before next change)

Please help without many subqueries (tables are heavy)

    TABLE A >>
        uid  name  date
        1   test1  2019-09-01
        2   test1  2019-09-02
        3   test1  2019-09-03
        4   test2  2019-09-01
        5   test2  2019-09-02
        6   test2  2019-09-03

    TABLE B >>
    uid  name  date_change  category
    1   test1  2019-08-30  cat1
    2   test1  2019-09-02  cat2
    4   test1  2019-09-04  cat3
    5   test2  2019-09-01  cat4
    6   test2  2019-09-03  cat5

    TABLE Result >>
    uid  name  date       date_change  category
    1   test1  2019-09-01  2019-08-30  cat1
    2   test1  2019-09-02  2019-09-02  cat2
    3   test1  2019-09-03  2019-09-02  cat2
    4   test2  2019-09-01  2019-09-01  cat4
    5   test2  2019-09-02  2019-09-01  cat4
    6   test2  2019-09-03  2019-09-03  cat5


Upvotes: 0

Views: 54

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT a.name, `date`, date_change, category
FROM `project.dataset.tableA` a
LEFT JOIN(
  SELECT name, date_change, category, 
    DATE_SUB(LEAD(date_change, 1, DATE '2100-01-01') OVER(PARTITION BY name ORDER BY date_change), INTERVAL 1 DAY) last_day
  FROM `project.dataset.tableB` 
) b
ON a.name = b.name 
AND a.date BETWEEN b.date_change AND b.last_day
-- ORDER BY name, date_change

If to apply to sample data from your question - result is

Row name    date        date_change category     
1   test1   2019-09-01  2019-08-30  cat1     
2   test1   2019-09-02  2019-09-02  cat2     
3   test1   2019-09-03  2019-09-02  cat2     
4   test2   2019-09-01  2019-09-01  cat4     
5   test2   2019-09-02  2019-09-01  cat4     
6   test2   2019-09-03  2019-09-03  cat5     

Upvotes: 1

GMB
GMB

Reputation: 222482

One option uses a correlated subquery to retrieve the latest cat from b prior to the date of a:

select 
    a.*,
    (
        select cat 
        from b 
        where b.name = a.name and b.date_changed <= a.date 
        order by b.date_changed desc 
        limit 1
    ) cat
from a

Upvotes: 1

Related Questions