pseudocode425
pseudocode425

Reputation: 77

Pandas replace values in dataframe conditionally based on string compare

I have a pandas dataframe as below with 3 columns. I want to compare each column to see if the value matches a particular string, and if yes, replace the value with NaN.

For example, if there are 5 values in column 1 of the data frame:

abcd
abcd
defg
abcd
defg

and if the comparison string is defg, the end result for column 1 in the data frame should be.

abcd
abcd
NaN
abcd
NaN

Upvotes: 2

Views: 11938

Answers (4)

Drew Nicolette
Drew Nicolette

Reputation: 162

There's a bunch of solutions... If you want to practice with using lambda functions you could always do...

df['Col1'] = df.Col1.apply(lambda x: np.nan if x == 'defg' else x)

Result:

0  abcd
1  abcd
2   NaN
3  abcd
4   NaN
Seconds:  0.0020899999999999253

Processing time is probably a little bit slower than the solutions above though after some unit testing.

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153460

You can use mask, this will replace 'defg' in the entire dataframe with NaN:

df.mask(df == 'defg')

Output:

      0
0  abcd
1  abcd
2   NaN
3  abcd
4   NaN

You can do this for a column also:

df['col1'].mask(df['col1'] == 'defg')

Or using replace as @pygo suggest in his solution

df['col1'].replace('defg',np.nan)

Upvotes: 1

Karn Kumar
Karn Kumar

Reputation: 8816

Use pandas in-built solution Using replace method as a regex and inplace method to make it permanent in the dataframe, while use numpy to replace the matching values to NaN.

import pandas as pd
import numpy as np

Example DataFrame:

df
   col1
0  abcd
1  abcd
2  defg
3  abcd
4  defg

Result:

df['col1'].replace(['defg'], np.nan, regex=True, inplace=True)
   df
       col1
    0  abcd
    1  abcd
    2   NaN
    3  abcd
    4   NaN

Upvotes: 4

Toby Petty
Toby Petty

Reputation: 4660

You can use numpy where to set values based on boolean conditions:

import numpy as np
df["col_name"] = np.where(df["col_name"]=="defg", np.nan, df["col_name"])

Obviously replace col_name with whatever your actual column name is.

An alternative is to use pandas .loc to change the values in the DataFrame in place:

df.loc[df["col_name"]=="defg", "col_name"] = np.nan

Upvotes: 1

Related Questions