Reputation: 205
I am trying to filter outliers in a mySQL dataset without success. I'd like to calculate an average of a column, broken down by date, excluding values that fall outside 2 standard deviations from the mean.
an example table:
+---------------------+----------+
| date_time | duration |
+---------------------|----------+
| 2015-01-01 00:01:00 | 1 |
| 2015-01-01 00:02:00 | 2 |
| 2015-01-01 00:02:20 | 23 |
| 2015-01-01 00:03:10 | 3 |
| 2015-01-02 00:01:00 | 3 |
| 2015-01-02 00:02:00 | 4 |
| 2015-01-02 00:02:20 | 52 |
| 2015-01-02 00:03:10 | 5 | ...
How can I get an average of the duration column excluding the outliers "23" and "52" which are well outside of 2 standard deviations from the mean?
I'd like to get a result like:
+------------+---------+
| date | average |
+------------+---------|
| 2015-01-01 | 2 |
| 2015-02-01 | 4 |
I think I need 2 separate queries combined with a join, or a sub query, but I can't figure it out.
Upvotes: 2
Views: 3459
Reputation: 48357
SELECT AVG(value)
FROM yourtable yt
INNER JOIN (SELECT AVG(value) AS avrg, STDDEV(value) AS stdv
FROM your table ) ilv
ON yt.value BETWEEN avrg-2*stdv AND avrg+2*stdv
I can't give you an answer like your example as your example doesn't make much sense.
Upvotes: 4