Reputation: 137
so I have the two following tables
Table A
Date num
01-16-15 10
02-20-15 12
03-20-15 13
Table B
Date Value
01-02-15 100
01-03-15 101
. .
01-17-15 102
01-18-15 103
. .
02-22-15 104
. .
03-20-15 110
And i want to create a table that have the the following output in impala
Date Value
01-17-15 102*10
01-18-15 103*10
02-22-15 104*12
. .
. .
So the idea is that we only consider dates between 01-16-15 and 02-20-15, and 02-20-15 and 03-20-15 exclusively. And use the num from the starting date of that period, say 01-16-15, and multiply it by everyday in the period, i.e. 1-16 to 2-20.
I understand it should be done by join but I am not sure how do you join in this case. Thanks!
Upvotes: 1
Views: 69
Reputation: 52088
This works on MariaDb (MySql) and it's pretty basic so hopefully it works on impala too.
SELECT b.date, b.value * a.num
FROM tableB b, tableA a
WHERE b.date >= a.date
AND (b.date < (SELECT MIN(c.date) FROM tableA c WHERE c.date > a.date)
OR NOT EXISTS(SELECT c.date FROM tableA c WHERE c.date > a.date))
The last NOT EXISTS... was needed to include dates after the last date in table A
Update In the revised version of the question the date in B is never larger (after) the last date in A so then the query can be written as
SELECT b.date, b.value * a.num
FROM tableB b, tableA a
WHERE b.date >= a.date
AND b.date <= (SELECT MIN(c.date) FROM tableA c WHERE c.date > a.date)
Upvotes: 1
Reputation: 1271003
Hmmm. In standard SQL you can do:
select b.*,
(select a.num
from a
where a.date <= b.date
order by a.date desc
fetch first 1 row only
) * value as new_value
from b;
I don't think this meets the range conditions, but I don't understand your description of that.
I also don't know if Impala supports correlated subqueries. An alternative is probably faster on complex data:
with ab as (
select a.date, a.value as a_value, null as b_value, 'a' as which
from a
union all
select b.date, null as a_value, b_value, 'b' as which
from b
)
select date, b_value * a_real_value
from (select ab.*,
max(a_value) over (partition by a_date) as a_real_value
from (select ab.*,
max(a.date) over (order by date, which) as a_date
from ab
) ab
) ab
where which = 'b';
Upvotes: 1