len
len

Reputation: 807

Median from NumPy and DataFrame are differently evaluated

Why do Pandas and NumPy treat their evaluation differently for some basic functions like the median?

Pandas automatically omits NaN values, NumPy does not.

import numpy as np
import pandas as pd

np.random.seed(10)

df = pd.DataFrame(np.random.randint(0, 10, size=10), columns=['x'])
df.loc[df.x > 1, 'x'] = np.NaN

print(df)

#     x
#0  NaN
#1  NaN
#2  0.0
#3  1.0
#4  NaN
#5  0.0
#6  1.0
#7  NaN
#8  NaN
#9  0.0

print(df['x'].median())

#0.0

print(np.median(df['x']))

#nan

Upvotes: 0

Views: 576

Answers (1)

braml1
braml1

Reputation: 584

They are 2 different libraries. They use different conventions/defaults.

If you want to ignore the NaN:

np.nanmedian(df['x'])
df['x'].median()

If you want to have a NaN result:

np.median(df['x'])
df['x'].median(skipna=False)

Upvotes: 3

Related Questions