Reputation: 77
select *,
(select sum(my_val) from my_data data1
where my_date > date_sub(CAST(data2.start_date as DATE), 180)
and my_date < date_sub(CAST(data2.start_date as DATE), 90)
and data2.id = data1.id ) as changed_1,
(select sum(my_val) from my_data data1
where my_date > date_sub(CAST(data2.start_date as DATE), 160)
and my_date < date_sub(CAST(data2.start_date as DATE), 150)
and data2.id = data1.id ) as changed_2,
(select sum(my_val) from my_data data1
where my_date > date_sub(CAST(data2.start_date as DATE), 80)
and my_date < date_sub(CAST(data2.start_date as DATE), 45)
and data2.id = data1.id ) as changed_3
...<insert a lot of other aggregations based on different date ranges>...
from otherdata_data data2
I am trying to execute the above sql, however I get the following error
The correlated scalar subquery can only contain equality predicates
I have 50 or so aggregate sub queries all based on different parameters between my_data
and otherdata_data
each aggregation will be on a different date range and some other possible factors. Is it possible to do this in spark sql since it doesnt seem to like non-equalities in sub queries of a predicate?
Upvotes: 0
Views: 145
Reputation: 1269873
Use conditional aggregation:
select data1.id,
sum(case when my_date > date_sub(CAST(data2.start_date as DATE), 180) and my_date < date_sub(CAST(data2.start_date as DATE), 90)
then my_val
end),
sum(case when my_date > date_sub(CAST(data2.start_date as DATE), 160) and my_date < date_sub(CAST(data2.start_date as DATE), 150)
then my_val
end),
. . .
from data1
group by data1.id;
You can then join in other tables (perhaps using this as a subquery).
Upvotes: 1
Reputation: 24568
how about chnage your query to this:
select
data2.*,
sum(case when data1.my_date > data1.date_sub(CAST(data2.start_date as DATE), 180) and data1.my_date < data1.date_sub(CAST(data2.start_date as DATE), 90) then my_val end) as changed_1,
sum(case when data1.my_date > data1.date_sub(CAST(data2.start_date as DATE), 160) and data1.my_date < data1.date_sub(CAST(data2.start_date as DATE), 150) then my_val end) as changed_2,
sum(case when data1.my_date > data1.date_sub(CAST(data2.start_date as DATE), 80) and data1.my_date < data1.date_sub(CAST(data2.start_date as DATE), 45) then my_val end) as changed_3
from
otherdata_data data2
join my_data data1
on data2.id = data1.id
group by <all columns of otherdata_data table>
Upvotes: 1