Reputation: 3747
Having some trouble figuring out how to make these query.
In general I have a table with
what I want to do is have a view that shows for each sales item how much the employee on average sells for 1 year previous of the sale_date.
example: Suppose I have this in the sales table
sales_ID employee_id sale_date sale_price
1 Bob 2016/06/10 100
2 Bob 2016/01/01 75
3 Bob 2014/01/01 475
4 Bob 2015/12/01 100
5 Bob 2016/05/01 200
6 Fred 2016/01/01 30
7 Fred 2015/05/01 50
for sales_id 1 record I want to pull all sales from Bob by 1 year up to the month of the sale (so 2015-05-01 to 2016-05-31 which has 3 sales for 75, 100, 200) so the final output would be
sales_ID employee_id sale_date sale_price avg_sale
1 Bob 2016/06/10 100 125
2 Bob 2016/01/01 75 275
3 Bob 2014/01/01 475 null
4 Bob 2015/12/01 100 475
5 Bob 2016/05/01 200 87.5
6 Fred 2016/01/01 30 50
7 Fred 2015/05/01 50 null
What I tried doing is something like this
select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date, b.avg_price
from sales a
left join (
select employee_id, avg(sale_price) as avg_price
from sales
where sale_date between Date(VARCHAR(YEAR(a.sale_date)-1) ||'-'|| VARCHAR(MONTH(a.sale_date)-1) || '-01')
and Date(VARCHAR(YEAR(a.sale_date)) ||'-'|| VARCHAR(MONTH(a.sale_date)) || '-01') -1 day
group by employee_id
) b on a.employee_id = b.employee_id
which DB2 doesn't like using the parent table a in the sub query, but I can't think of how to properly write this query. any thoughts?
Upvotes: 1
Views: 575
Reputation: 7181
You can almost fix your original query by doing a LATERAL
join. Lateral allows you to reference previously declared tables as in:
select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date, b.avg_price
from sales a
left join LATERAL (
select employee_id, avg(sale_price) as avg_price
from sales
where sale_date between Date(VARCHAR(YEAR(a.sale_date)-1) ||'-'|| VARCHAR(MONTH(a.sale_date)-1) || '-01')
and Date(VARCHAR(YEAR(a.sale_date)) ||'-'|| VARCHAR(MONTH(a.sale_date)) || '-01') -1 day
group by employee_id
) b on a.employee_id = b.employee_id
However, I get an syntax error from your date arithmetic, so using @Utsav solution for this yields:
select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date, b.avg_price
from sales a
left join lateral (
select employee_id, avg(sale_price) as avg_price
from sales b
where a.employee_id = b.employee_id
and b.sale_date between a.sale_date - 365 and a.sale_date -1
group by employee_id
) b on a.employee_id = b.employee_id
Since we already pushed the predicate inside the LATERAL
join, it is strictly speaking not necessary to use the on clause:
select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date, b.avg_price
from sales a
left join lateral (
select employee_id, avg(sale_price) as avg_price
from sales b
where a.employee_id = b.employee_id
and b.sale_date between a.sale_date - 365 and a.sale_date -1
group by employee_id
) b on 1=1
By using a LATERAL
join we removed one access against the sales table. A comparison of the plans show:
Access Plan:
Total Cost: 20,4571
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
7
>MSJOIN
( 2)
20,4565
3
/---+----\
7 0,388889
TBSCAN FILTER
( 3) ( 6)
6,81572 13,6402
1 2
| |
7 2,72222
SORT GRPBY
( 4) ( 7)
6,81552 13,6397
1 2
| |
7 2,72222
TBSCAN TBSCAN
( 5) ( 8)
6,81488 13,6395
1 2
| |
7 2,72222
TABLE: LELLE SORT
SALES ( 9)
Q6 13,6391
2
|
2,72222
HSJOIN
( 10)
13,6385
2
/-----+------\
7 7
TBSCAN TBSCAN
( 11) ( 12)
6,81488 6,81488
1 1
| |
7 7
TABLE: LELLE TABLE: LELLE
SALES SALES
Q2 Q1
Access Plan:
Total Cost: 13,6565
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
7
>^NLJOIN
( 2)
13,6559
2
/---+----\
7 0,35
TBSCAN GRPBY
( 3) ( 4)
6,81488 6,81662
1 1
| |
7 0,35
TABLE: LELLE TBSCAN
SALES ( 5)
Q5 6,81656
1
|
7
TABLE: LELLE
SALES
Q1
DB2 does not yet support range frames over dates, but by using a clever trick by @mustaccio in:
we can actually use only one table access and solve the problem:
select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date
, avg(sale_price) over (partition by employee_id
order by julian_day(a.sale_date)
range between 365 preceding
and 1 preceding
) as avg_price
from sales a
Access Plan:
Total Cost: 6.8197
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
7
TBSCAN
( 2)
6.81753
1
|
7
SORT
( 3)
6.81703
1
|
7
TBSCAN
( 4)
6.81488
1
|
7
TABLE: LELLE
SALES
Q1
Upvotes: 1
Reputation: 8143
Ok. I think I figured it out. Please note 3 things.
between
part in where
clause in inner query, as you mentioned in the question.The expected output you mentioned is incorrect. So for every sale_id
, I took the date, found the employee_id
, took all the sales of that employee for last 1 year, excluding the current date, and then took average. If you want to change it, you can change the where clause in subquery.
select t1.*,t2.avg_sale
from
sales t1
left join
(
select a.sales_id
,avg(b.sale_price) as avg_sale
from sales a
inner join
sales b
on a.employee_id=b.employee_id
where b.sale_date between a.sale_date - 365 and a.sale_date -1
group by a.sales_id
) t2
on t1.sales_id=t2.sales_id
order by t1.sales_id
Output
+----------+-------------+-------------+------------+----------+
| SALES_ID | EMPLOYEE_ID | SALE_DATE | SALE_PRICE | AVG_SALE |
+----------+-------------+-------------+------------+----------+
| 1 | Bob | 10-JUN-2016 | 100 | 125 |
| 2 | Bob | 01-JAN-2016 | 75 | 100 |
| 3 | Bob | 01-JAN-2014 | 475 | |
| 4 | Bob | 01-DEC-2015 | 100 | |
| 5 | Bob | 01-MAY-2016 | 200 | 87.5 |
| 6 | Fred | 01-JAN-2016 | 30 | 50 |
| 7 | Fred | 01-MAY-2015 | 50 | |
+----------+-------------+-------------+------------+----------+
Upvotes: 1