Andy B
Andy B

Reputation: 205

How to filter outliers in mySQL?

I am trying to filter outliers in a mySQL dataset without success. I'd like to calculate an average of a column, broken down by date, excluding values that fall outside 2 standard deviations from the mean.

an example table:

+---------------------+----------+
|      date_time      | duration |
+---------------------|----------+
| 2015-01-01 00:01:00 |    1     |
| 2015-01-01 00:02:00 |    2     |
| 2015-01-01 00:02:20 |    23    |
| 2015-01-01 00:03:10 |    3     |
| 2015-01-02 00:01:00 |    3     |
| 2015-01-02 00:02:00 |    4     |
| 2015-01-02 00:02:20 |    52    |
| 2015-01-02 00:03:10 |    5     |   ...

How can I get an average of the duration column excluding the outliers "23" and "52" which are well outside of 2 standard deviations from the mean?

I'd like to get a result like:

+------------+---------+
|   date     | average |
+------------+---------|
| 2015-01-01 |    2    |
| 2015-02-01 |    4    |

I think I need 2 separate queries combined with a join, or a sub query, but I can't figure it out.

Upvotes: 2

Views: 3459

Answers (1)

symcbean
symcbean

Reputation: 48357

SELECT AVG(value)
FROM yourtable yt
INNER JOIN (SELECT AVG(value) AS avrg, STDDEV(value) AS stdv
FROM your table ) ilv
ON yt.value BETWEEN avrg-2*stdv AND avrg+2*stdv

I can't give you an answer like your example as your example doesn't make much sense.

Upvotes: 4

Related Questions