vince
vince

Reputation: 57

Python Pandas replace values if not in value range

I would like to replace all cells of a column if each value IS NOT in a specific value range.

E.g. value range between 0 and 10

The function should put np.NaN on all cells which are below 0 or above 10.

I tried with this:

df.loc[(df["B"] < 5 ), "B"] = np.NaN

but it only works with a specific value, not with a value range.

Is there a simple solution to replace all values outside a specific value range, without iterating through all rows?

Upvotes: 0

Views: 3300

Answers (4)

Lorena Gil
Lorena Gil

Reputation: 351

You can use np.where, specifying the desired conditions. If True, yield x, otherwise yield y.

np.where(condition, x, y)

So, the solution would be:

df.B = np.where((df.B < 0) & (df.B > 10), np.NaN, df.B)

For example:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))

Will output something like that:


    A   B   C   D
0   2   5   6   2
1   0   4   0   0
2   4   3   9   0
3   5   1   1   8
4   2   3   6   5
5   3   0   3   9
6   0   4   3   4
7   4   1   4   5
8   0   5   1   5
9   6   7   4   4

Then if you apply the where condition:

df.B = np.where((df.B < 6) & (df.B > 2), np.NaN, df.B)
    A   B   C   D
0   2   NaN 6   2
1   0   NaN 0   0
2   4   NaN 9   0
3   5   1.0 1   8
4   2   NaN 6   5
5   3   0.0 3   9
6   0   NaN 3   4
7   4   1.0 4   5
8   0   NaN 1   5
9   6   7.0 4   4

You can find more information here: https://numpy.org/doc/stable/reference/generated/numpy.where.html

Upvotes: 3

辜乘风
辜乘风

Reputation: 152

I will choose between func

df.loc[~df.B.between(0, 10), "B"] = np.nan

Upvotes: 5

qmeeus
qmeeus

Reputation: 2412

Yup, you can just do something like this:


df["B"] = df["B"].where((df["B"] >= 0) & (df["B"] <= 10))

# or
df["B"] = df["B"].map(lambda x: x if 0 <= x <= 10 else None)

# or
df.loc[(df["B"] < 0) | (df["B"] > 10), "B"] = None

Upvotes: 1

igrinis
igrinis

Reputation: 13656

More closely to your original syntax

df.loc[(df["B"] < 0 )|(df["B"] > 10 ), "B"] = np.NaN

Upvotes: 2

Related Questions