Christian O.
Christian O.

Reputation: 498

Pandas: Remove values that meet condition

Let's say I have data like this:

df = pd.DataFrame({'category': ["blue","red","blue", "blue","green"], 'val1': [5, 3, 2, 2, 5], 'val2':[1, 3, 2, 2, 5], 'val3': [2, 1, 1, 4, 3]})
print(df)

  category  val1  val2  val3
0     blue     5     1     2
1      red     3     3     1
2     blue     2     2     1
3     blue     2     2     4
4    green     5     5     3

How do I remove (or replace with for example NaN) values that meet a certain condition without removing the entire row or shift the column?

Let's say my condition is that I want to remove all values below 3 from the above data, the result would have to look like this:

  category  val1  val2  val3
0     blue     5     
1      red     3     3     
2     blue     
3     blue                 4
4    green     5     5     3

Upvotes: 1

Views: 688

Answers (3)

Dani Mesejo
Dani Mesejo

Reputation: 61900

Use mask:

df.iloc[:, 1:] = df.iloc[:, 1:].mask(df.iloc[:, 1:] < 3)
print(df)

Output

  category  val1  val2  val3
0     blue   5.0   NaN   NaN
1      red   3.0   3.0   NaN
2     blue   NaN   NaN   NaN
3     blue   NaN   NaN   4.0
4    green   5.0   5.0   3.0

If you want to set particular value, for example 0, do:

df.iloc[:, 1:] = df.iloc[:, 1:].mask(df.iloc[:, 1:] < 3, 0)
print(df)

Output

  category  val1  val2  val3
0     blue     5     0     0
1      red     3     3     0
2     blue     0     0     0
3     blue     0     0     4
4    green     5     5     3

If you just need a few columns, you could do:

df[['val1', 'val2', 'val3']] = df[['val1', 'val2', 'val3']].mask(df[['val1', 'val2', 'val3']] < 3)
print(df)

Output

  category  val1  val2  val3
0     blue   5.0   NaN   NaN
1      red   3.0   3.0   NaN
2     blue   NaN   NaN   NaN
3     blue   NaN   NaN   4.0
4    green   5.0   5.0   3.0

Upvotes: 4

Danail Petrov
Danail Petrov

Reputation: 1875

You can use applymap or transform to columns containing integers.

df[df.iloc[:,1:].transform(lambda x: x>=3)].fillna('')

Upvotes: 0

oli5679
oli5679

Reputation: 1749

One approach is to create a mask of the values that don't meet the removal criteria.

mask = df[['val1','val2','val3']] > 3

You can then create a new df, that is just the non-removed vals.

updated_df = df[['val1','val2','val3']][mask]

You need to add back in the unaffected columns.

updated_df['category'] = df['category']

Upvotes: 0

Related Questions