Reputation: 105
I have information about sales per day. For example:
Date - Product - Amount
01-07-2020 - A - 10
01-03-2020 - A - 20
01-02-2020 - B - 10
Now I would like to know the average sales per day and the standard deviation for the last year. For average I can just count the number of entries per item, and then count 365-amount of entries and take that many 0's, but I wonder what the best way is to calculate the standard deviation while incorporating the 0's for the days there are not sales.
Upvotes: 0
Views: 248
Reputation: 168232
Use a hierarchical (or recursive) query to generate daily dates for the year and then use a PARTITION OUTER JOIN
to join it to your product data then you can find the average and standard deviation with the AVG
and STDDEV
aggregation functions and use COALESCE
to fill in NULL
values with zeroes:
WITH start_date ( dt ) AS (
SELECT DATE '2020-01-01' FROM DUAL
),
calendar ( dt ) AS (
SELECT dt + LEVEL - 1
FROM start_date
CONNECT BY dt + LEVEL - 1 < ADD_MONTHS( dt, 12 )
)
SELECT product,
AVG( COALESCE( amount, 0 ) ) AS average_sales_per_day,
STDDEV( COALESCE( amount, 0 ) ) AS stddev_sales_per_day
FROM calendar c
LEFT OUTER JOIN (
SELECT t.*
FROM test_data t
INNER JOIN start_date s
ON (
s.dt <= t."DATE"
AND t."DATE" < ADD_MONTHS( s.dt, 12 )
)
) t
PARTITION BY ( t.product )
ON ( c.dt = t."DATE" )
GROUP BY product
So, for your sample data:
CREATE TABLE test_data ( "DATE", Product, Amount ) AS
SELECT DATE '2020-07-01', 'A', 10 FROM DUAL UNION ALL
SELECT DATE '2020-03-01', 'A', 20 FROM DUAL UNION ALL
SELECT DATE '2020-02-01', 'B', 10 FROM DUAL;
This outputs:
PRODUCT | AVERAGE_SALES_PER_DAY | STDDEV_SALES_PER_DAY :------ | ----------------------------------------: | ----------------------------------------: A | .0819672131147540983606557377049180327869 | 1.16752986363678031669548047505759328696 B | .027322404371584699453551912568306010929 | .5227083734893166933219264686616717636897
db<>fiddle here
Upvotes: 1