Reputation: 9
I am computing for the average lead time after outliers are removed. I have both z-scores and standard deviation to do the calculation.
I have used this sql query:
SELECT *
FROM (
SELECT
ROUND(AVG(DATEDIFF(shipped_date, order_date)),2) AS actual_ave_lead_time,
DATEDIFF(shipped_date, order_date) - AVG(DATEDIFF(shipped_date, order_date))/
STDDEV(DATEDIFF(shipped_date, order_date)) AS zscore
FROM orders
) AS score_table
WHERE zscore BETWEEN zscore<1.96 AND >-.96;
I am expecting to get the overall average of the actual_ave_lead_time.
Upvotes: 0
Views: 94
Reputation: 12998
To calculate your z-scores you need to run the calculation (z = (x-μ)/σ
) per order. The innermost sub-query calculates the AVG()
and STDDEV()
for the full set of orders, which is then joined to the orders to calculate the z-scores, which can then be used to exclude the outliers from the outermost AVG()
-
SELECT AVG(lead_time)
FROM (
SELECT
DATEDIFF(shipped_date, order_date) AS lead_time,
(DATEDIFF(shipped_date, order_date) - avg_lead_time) / stddev_lead_time AS zscore
FROM orders o
JOIN (
SELECT
AVG(DATEDIFF(shipped_date, order_date)) AS avg_lead_time,
STDDEV(DATEDIFF(shipped_date, order_date)) AS stddev_lead_time
FROM orders
) s
HAVING zscore BETWEEN -0.96 AND 1.96
) t;
If you are using MySQL 8 you could use the aggregate functions as window functions -
SELECT AVG(lead_time)
FROM (
SELECT
DATEDIFF(shipped_date, order_date) AS lead_time,
(DATEDIFF(shipped_date, order_date) - AVG(DATEDIFF(shipped_date, order_date)) OVER()) / STDDEV(DATEDIFF(shipped_date, order_date)) OVER() AS zscore
FROM orders
) t
WHERE zscore BETWEEN -0.96 AND 1.96;
Without example data, this is untested but I think it is correct.
Upvotes: 2