James Xiang
James Xiang

Reputation: 137

Joining tables that compute values between dates

so I have the two following tables

Table A

Date        num
01-16-15    10
02-20-15    12
03-20-15    13

Table B

Date        Value
01-02-15    100
01-03-15    101
   .         .
01-17-15    102
01-18-15    103
   .         .
02-22-15    104
   .         .
03-20-15    110

And i want to create a table that have the the following output in impala

Date         Value
01-17-15     102*10
01-18-15     103*10
02-22-15     104*12
   .           .
   .           .

So the idea is that we only consider dates between 01-16-15 and 02-20-15, and 02-20-15 and 03-20-15 exclusively. And use the num from the starting date of that period, say 01-16-15, and multiply it by everyday in the period, i.e. 1-16 to 2-20.

I understand it should be done by join but I am not sure how do you join in this case. Thanks!

Upvotes: 1

Views: 69

Answers (2)

Joakim Danielson
Joakim Danielson

Reputation: 52088

This works on MariaDb (MySql) and it's pretty basic so hopefully it works on impala too.

SELECT b.date, b.value * a.num
FROM tableB b, tableA a
WHERE b.date >= a.date
  AND (b.date < (SELECT MIN(c.date) FROM tableA c WHERE c.date > a.date)
       OR NOT EXISTS(SELECT c.date FROM tableA c WHERE c.date > a.date))

The last NOT EXISTS... was needed to include dates after the last date in table A

Update In the revised version of the question the date in B is never larger (after) the last date in A so then the query can be written as

SELECT b.date, b.value * a.num
FROM tableB b, tableA a
WHERE b.date >= a.date
  AND b.date <= (SELECT MIN(c.date) FROM tableA c WHERE c.date > a.date)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271003

Hmmm. In standard SQL you can do:

select b.*,
       (select a.num
        from a
        where a.date <= b.date
        order by a.date desc
        fetch first 1 row only
       ) * value as new_value
from b;

I don't think this meets the range conditions, but I don't understand your description of that.

I also don't know if Impala supports correlated subqueries. An alternative is probably faster on complex data:

with ab as (
      select a.date, a.value as a_value, null as b_value, 'a' as which
      from a
      union all
      select b.date, null as a_value, b_value, 'b' as which
      from b
     )
select date, b_value * a_real_value
from (select ab.*,
             max(a_value) over (partition by a_date) as a_real_value
      from (select ab.*,
                   max(a.date) over (order by date, which) as a_date
            from ab
           ) ab
     ) ab
where which = 'b';

Upvotes: 1

Related Questions