ah bon
ah bon

Reputation: 10021

Replace 0 with NaN for selected columns only if all values are 0 in Pandas

Given a dataframe df as follows:

   id  value1  value2  value3
0   0      22       1       7
1   1       0       0       0
2   2       0       0       0
3   3       4       1      25
4   4       5       0      24
5   5       0       0       3

Or:

df = pd.DataFrame([{'id': 0, 'value1': 22, 'value2': 1, 'value3': 7},
 {'id': 1, 'value1': 0, 'value2': 0, 'value3': 0},
 {'id': 2, 'value1': 0, 'value2': 0, 'value3': 0},
 {'id': 3, 'value1': 4, 'value2': 1, 'value3': 25},
 {'id': 4, 'value1': 5, 'value2': 0, 'value3': 24},
 {'id': 5, 'value1': 0, 'value2': 0, 'value3': 3}])

I need to subset columns 'value1', 'value2', 'value3', replace 0 with NaN if all values are 0s row-wisely. How could I acheive that in Pandas?

The exepected result will like this:

   id  value1  value2  value3
0   0    22.0     1.0     7.0
1   1     NaN     NaN     NaN
2   2     NaN     NaN     NaN
3   3     4.0     1.0    25.0
4   4     5.0     0.0    24.0
5   5     0.0     0.0     3.0

Code I have tried:

cols = ['value1', 'value2', 'value3']
df[cols] = df[cols].replace({'0': np.NaN, 0: np.NaN})

Upvotes: 2

Views: 3612

Answers (4)

Tanner Ormanoski
Tanner Ormanoski

Reputation: 35

I know this is old, but I was trying to figure out the same, and after the preceding solutions didn’t work for me, I found this.

df[cols].replace(to_replace=0, value=np.NaN, inplace=True)

Hopefully this solution will be useful for anyone who experiences the same problem.

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71580

Use mask:

df[cols] = df[cols].mask(df[cols].eq(0).all(axis=1))

mask automatically sets the row to NaN if the condition (df[cols].eq(0).all(axis=1)) is True.

Original answer:

I'd prefer mask:

>>> df.set_index('id').mask(df[cols].eq(0).all(axis=1))
    value1  value2  value3
id                        
0     22.0     1.0     7.0
1      NaN     NaN     NaN
2      NaN     NaN     NaN
3      4.0     1.0    25.0
4      5.0     0.0    24.0
5      0.0     0.0     3.0
>>> 

With resetting index:

>>> df.set_index('id').mask(df[cols].eq(0).all(axis=1)).reset_index()
   id  value1  value2  value3
0   0    22.0     1.0     7.0
1   1     NaN     NaN     NaN
2   2     NaN     NaN     NaN
3   3     4.0     1.0    25.0
4   4     5.0     0.0    24.0
5   5     0.0     0.0     3.0
>>> 

Upvotes: 4

Pedro Maia
Pedro Maia

Reputation: 2722

You're mixing types, your code tries to compare two different data types and throws an TypeError: TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'str' To make your code work you simply have to remove '0': np.NaN from your dict or instead of a dict pass two arguments:

df[cols] = df[cols].replace(0, float('NaN'))

Upvotes: 1

BigBen
BigBen

Reputation: 50008

Perhaps:

mask = (df[cols].eq(0).all(axis=1))
df.loc[mask, cols] = np.nan

Output:

   id  value1  value2  value3
0   0    22.0     1.0     7.0
1   1     NaN     NaN     NaN
2   2     NaN     NaN     NaN
3   3     4.0     1.0    25.0
4   4     5.0     0.0    24.0
5   5     0.0     0.0     3.0

Upvotes: 3

Related Questions