Wizuriel
Wizuriel

Reputation: 3747

DB2 query to find average sale for each item 1 year previous

Having some trouble figuring out how to make these query.

In general I have a table with

what I want to do is have a view that shows for each sales item how much the employee on average sells for 1 year previous of the sale_date.

example: Suppose I have this in the sales table

sales_ID    employee_id    sale_date    sale_price
1           Bob            2016/06/10    100
2           Bob            2016/01/01    75
3           Bob            2014/01/01    475
4           Bob            2015/12/01    100
5           Bob            2016/05/01    200
6           Fred           2016/01/01    30
7           Fred           2015/05/01    50

for sales_id 1 record I want to pull all sales from Bob by 1 year up to the month of the sale (so 2015-05-01 to 2016-05-31 which has 3 sales for 75, 100, 200) so the final output would be

sales_ID    employee_id    sale_date    sale_price    avg_sale
1           Bob            2016/06/10    100          125
2           Bob            2016/01/01    75           275
3           Bob            2014/01/01    475          null
4           Bob            2015/12/01    100          475
5           Bob            2016/05/01    200          87.5
6           Fred           2016/01/01    30           50
7           Fred           2015/05/01    50           null

What I tried doing is something like this

select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date, b.avg_price
from sales a
left join (
     select employee_id, avg(sale_price) as avg_price
     from sales 
     where sale_date between Date(VARCHAR(YEAR(a.sale_date)-1) ||'-'|| VARCHAR(MONTH(a.sale_date)-1) || '-01')
                  and Date(VARCHAR(YEAR(a.sale_date)) ||'-'|| VARCHAR(MONTH(a.sale_date)) || '-01') -1 day
    group by employee_id
) b on a.employee_id = b.employee_id

which DB2 doesn't like using the parent table a in the sub query, but I can't think of how to properly write this query. any thoughts?

Upvotes: 1

Views: 575

Answers (2)

Lennart - Slava Ukraini
Lennart - Slava Ukraini

Reputation: 7181

You can almost fix your original query by doing a LATERAL join. Lateral allows you to reference previously declared tables as in:

select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date, b.avg_price
from sales a
left join LATERAL (
     select employee_id, avg(sale_price) as avg_price
     from sales 
     where sale_date between Date(VARCHAR(YEAR(a.sale_date)-1) ||'-'|| VARCHAR(MONTH(a.sale_date)-1) || '-01')
                  and Date(VARCHAR(YEAR(a.sale_date)) ||'-'|| VARCHAR(MONTH(a.sale_date)) || '-01') -1 day
    group by employee_id
) b on a.employee_id = b.employee_id

However, I get an syntax error from your date arithmetic, so using @Utsav solution for this yields:

select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date, b.avg_price
from sales a
left join lateral (
    select employee_id, avg(sale_price) as avg_price
    from sales b
    where a.employee_id = b.employee_id 
     and b.sale_date between  a.sale_date - 365 and a.sale_date -1
    group by employee_id
) b on a.employee_id = b.employee_id

Since we already pushed the predicate inside the LATERAL join, it is strictly speaking not necessary to use the on clause:

select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date, b.avg_price
from sales a
left join lateral (
    select employee_id, avg(sale_price) as avg_price
    from sales b
    where a.employee_id = b.employee_id 
     and b.sale_date between  a.sale_date - 365 and a.sale_date -1
    group by employee_id
) b on 1=1

By using a LATERAL join we removed one access against the sales table. A comparison of the plans show:

No LATERAL Join

Access Plan:

        Total Cost:             20,4571
        Query Degree:           1

            Rows 
           RETURN
           (   1)
            Cost 
             I/O 
             |
              7 
           >MSJOIN
           (   2)
           20,4565 
              3 
         /---+----\
        7        0,388889 
     TBSCAN       FILTER
     (   3)       (   6)
     6,81572      13,6402 
        1            2 
       |            |
        7         2,72222 
     SORT         GRPBY 
     (   4)       (   7)
     6,81552      13,6397 
        1            2 
       |            |
        7         2,72222 
     TBSCAN       TBSCAN
     (   5)       (   8)
     6,81488      13,6395 
        1            2 
       |            |
        7         2,72222 
 TABLE: LELLE     SORT  
      SALES       (   9)
       Q6         13,6391 
                     2 
                    |
                  2,72222 
                  HSJOIN
                  (  10)
                  13,6385 
                     2 
              /-----+------\
             7                7 
          TBSCAN           TBSCAN
          (  11)           (  12)
          6,81488          6,81488 
             1                1 
            |                |
             7                7 
      TABLE: LELLE     TABLE: LELLE   
           SALES            SALES
            Q2               Q1

LATERAL Join

Access Plan:

    Total Cost:         13,6565
    Query Degree:       1

            Rows
           RETURN
           (   1)
            Cost
             I/O
             |
              7
          >^NLJOIN
          (   2)
           13,6559
              2
         /---+----\
        7          0,35
     TBSCAN       GRPBY
     (   3)       (   4)
     6,81488      6,81662
        1            1
       |            |
        7          0,35
 TABLE: LELLE     TBSCAN
      SALES       (   5)
       Q5         6,81656
                     1
                    |
                     7
              TABLE: LELLE
                   SALES
                    Q1

Window functions with framing

DB2 does not yet support range frames over dates, but by using a clever trick by @mustaccio in:

https://dba.stackexchange.com/questions/141263/what-is-the-meaning-of-order-by-x-range-between-n-preceding-if-x-is-a-dat

we can actually use only one table access and solve the problem:

select a.sales_ID, a.sale_price, a.employee_ID, a.sale_date
     , avg(sale_price) over (partition by employee_id 
                             order by julian_day(a.sale_date) 
                             range between 365 preceding
                                       and 1 preceding
                            ) as avg_price 
from sales a 

Access Plan:

    Total Cost:             6.8197
    Query Degree:           1

      Rows 
     RETURN
     (   1)
      Cost 
       I/O 
       |
        7 
     TBSCAN
     (   2)
     6.81753 
        1 
       |
        7 
     SORT  
     (   3)
     6.81703 
        1 
       |
        7 
     TBSCAN
     (   4)
     6.81488 
        1 
       |
        7 
 TABLE: LELLE   
      SALES
       Q1

Upvotes: 1

Utsav
Utsav

Reputation: 8143

Ok. I think I figured it out. Please note 3 things.

  1. I couldn't test it in DB2, so I used Oracle. But syntax would be more or less same.
  2. I didn't use your 1 year logic exactly. I am counting current_date minus 365 days, but you can change the between part in where clause in inner query, as you mentioned in the question.
  3. The expected output you mentioned is incorrect. So for every sale_id, I took the date, found the employee_id, took all the sales of that employee for last 1 year, excluding the current date, and then took average. If you want to change it, you can change the where clause in subquery.

    select t1.*,t2.avg_sale 
    from 
    sales t1
    left join 
    (
        select a.sales_id
        ,avg(b.sale_price) as avg_sale
        from sales a
            inner join 
        sales b
         on a.employee_id=b.employee_id
            where b.sale_date between  a.sale_date - 365 and  a.sale_date -1
        group by a.sales_id
    ) t2
    on t1.sales_id=t2.sales_id
    order by t1.sales_id
    

Output

+----------+-------------+-------------+------------+----------+
| SALES_ID | EMPLOYEE_ID |  SALE_DATE  | SALE_PRICE | AVG_SALE |
+----------+-------------+-------------+------------+----------+
|        1 | Bob         | 10-JUN-2016 |        100 | 125      |
|        2 | Bob         | 01-JAN-2016 |         75 | 100      |
|        3 | Bob         | 01-JAN-2014 |        475 |          |
|        4 | Bob         | 01-DEC-2015 |        100 |          |
|        5 | Bob         | 01-MAY-2016 |        200 | 87.5     |
|        6 | Fred        | 01-JAN-2016 |         30 | 50       |
|        7 | Fred        | 01-MAY-2015 |         50 |          |
+----------+-------------+-------------+------------+----------+

Upvotes: 1

Related Questions