Median from NumPy and DataFrame are differently evaluated

Question

Why do Pandas and NumPy treat their evaluation differently for some basic functions like the median?

Pandas automatically omits NaN values, NumPy does not.

import numpy as np
import pandas as pd

np.random.seed(10)

df = pd.DataFrame(np.random.randint(0, 10, size=10), columns=['x'])
df.loc[df.x > 1, 'x'] = np.NaN

print(df)

#     x
#0  NaN
#1  NaN
#2  0.0
#3  1.0
#4  NaN
#5  0.0
#6  1.0
#7  NaN
#8  NaN
#9  0.0

print(df['x'].median())

#0.0

print(np.median(df['x']))

#nan

braml1 · Accepted Answer

They are 2 different libraries. They use different conventions/defaults.

If you want to ignore the NaN:

np.nanmedian(df['x'])
df['x'].median()

If you want to have a NaN result:

np.median(df['x'])
df['x'].median(skipna=False)

Median from NumPy and DataFrame are differently evaluated

Answers (1)

Related Questions