J-H
J-H

Reputation: 1869

Python Pandas Dataframe replace values below treshold

How can I apply a function element-wise to a pandas DataFrame and pass a column-wise calculated value (e.g. quantile of column)? For example, what if I want to replace all elements in a DataFrame (with NaN) where the value is lower than the 80th percentile of the column?

def _deletevalues(x, quantile):
if x < quantile:
    return np.nan
else:
    return x

df.applymap(lambda x: _deletevalues(x, x.quantile(0.8)))

Using applymap only allows one to access each value individually and throws (of course) an AttributeError: ("'float' object has no attribute 'quantile'

Thank you in advance.

Upvotes: 12

Views: 20243

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

In [139]: df
Out[139]:
   a  b  c
0  1  7  3
1  1  2  6
2  3  0  5
3  8  2  1
4  7  3  5
5  6  7  2
6  0  2  1
7  8  4  1
8  5  0  6
9  7  7  6

for all columns:

In [145]: df.apply(lambda x: np.where(x < x.quantile(),np.nan,x))
Out[145]:
     a    b    c
0  NaN  7.0  NaN
1  NaN  NaN  6.0
2  NaN  NaN  5.0
3  8.0  NaN  NaN
4  7.0  3.0  5.0
5  6.0  7.0  NaN
6  NaN  NaN  NaN
7  8.0  4.0  NaN
8  NaN  NaN  6.0
9  7.0  7.0  6.0

or

In [149]: df[df < df.quantile()] = np.nan

In [150]: df
Out[150]:
     a    b    c
0  NaN  7.0  NaN
1  NaN  NaN  6.0
2  NaN  NaN  5.0
3  8.0  NaN  NaN
4  7.0  3.0  5.0
5  6.0  7.0  NaN
6  NaN  NaN  NaN
7  8.0  4.0  NaN
8  NaN  NaN  6.0
9  7.0  7.0  6.0

Upvotes: 6

jezrael
jezrael

Reputation: 863791

Use DataFrame.mask:

df = df.mask(df < df.quantile())
print (df)
     a    b    c
0  NaN  7.0  NaN
1  NaN  NaN  6.0
2  NaN  NaN  5.0
3  8.0  NaN  NaN
4  7.0  3.0  5.0
5  6.0  7.0  NaN
6  NaN  NaN  NaN
7  8.0  4.0  NaN
8  NaN  NaN  6.0
9  7.0  7.0  6.0

Upvotes: 18

Related Questions