Reputation: 807
Why do Pandas and NumPy treat their evaluation differently for some basic functions like the median?
Pandas automatically omits NaN values, NumPy does not.
import numpy as np
import pandas as pd
np.random.seed(10)
df = pd.DataFrame(np.random.randint(0, 10, size=10), columns=['x'])
df.loc[df.x > 1, 'x'] = np.NaN
print(df)
# x
#0 NaN
#1 NaN
#2 0.0
#3 1.0
#4 NaN
#5 0.0
#6 1.0
#7 NaN
#8 NaN
#9 0.0
print(df['x'].median())
#0.0
print(np.median(df['x']))
#nan
Upvotes: 0
Views: 576
Reputation: 584
They are 2 different libraries. They use different conventions/defaults.
If you want to ignore the NaN:
np.nanmedian(df['x'])
df['x'].median()
If you want to have a NaN result:
np.median(df['x'])
df['x'].median(skipna=False)
Upvotes: 3