Michael Dorner
Michael Dorner

Reputation: 20175

MAD results differ in pandas, scipy, and numpy

I want to compute the MAD (median absolute deviation) which is defined by

MAD = median(|x_i - mean(x)|)

for a list of numbers x

x = list(range(0, 10)) + [1000]

However, the results differ significantly using numpy, pandas, and an hand-made implementation:

from scipy import stats
import pandas as pd
import numpy as np

print(stats.median_absolute_deviation(x, scale=1)) # prints 3.0

print(pd.Series(x).mad()) # prints 164.54

print(np.median(np.absolute(x - np.mean(x)))) # prints 91.0

What is wrong?

Upvotes: 10

Views: 18663

Answers (2)

Mykola Zotko
Mykola Zotko

Reputation: 17884

The median absolute deviation is defined as:

median(|x_i - median(x)|

The method mad in Pandas returns the mean absolute deviation instead. You can calculate MAD using following methods:

x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1000]

stats.median_absolute_deviation(x, scale=1)
# 3.0

np.median(np.absolute(x - np.median(x)))
# 3.0

x = pd.Series(x)
(x - x.median()).abs().median()
# 3.0

Upvotes: 23

Gunjan Kayal
Gunjan Kayal

Reputation: 31

In pandas, MAD is actually 'mean absolute deviation' and not 'median absolute deviation'.

You can find the equation used in pandas here: https://www.skytowner.com/explore/pandas_dataframe_mad_method

Upvotes: 2

Related Questions