Reputation:
I want to replace negative values with nan
for only certain columns. The simplest way could be:
for col in ['a', 'b', 'c']:
df.loc[df[col ] < 0, col] = np.nan
df
could have many columns and I only want to do this to specific columns.
Is there a way to do this in one line? Seems like this should be easy but I have not been able to figure out.
Upvotes: 4
Views: 2614
Reputation: 294488
use loc
and where
cols = ['a', 'b', 'c']
df.loc[:, cols] = df[cols].where(df[cols].where.ge(0), np.nan)
demonstration
df = pd.DataFrame(np.random.randn(10, 5), columns=list('abcde'))
df
cols = list('abc')
df.loc[:, cols] = df[cols].where(df[cols].ge(0), np.nan)
df
You could speed it up with numpy
df[cols] = np.where(df[cols] < 0, np.nan, df[cols])
to do the same thing.
timing
def gen_df(n):
return pd.DataFrame(np.random.randn(n, 5), columns=list('abcde'))
since assignment is an important part of this, I create the df
from scratch each loop. I also added the timing for df
creation.
for n = 10000
for n = 100000
Upvotes: 7
Reputation: 131640
Sure, just pick your desired columns out of the mask:
(df < 0)[['a', 'b', 'c']]
You can use this mask in df[(df < 0)[['a', 'b', 'c']]] = np.nan
.
Upvotes: 2
Reputation: 12515
I don't think you'll get much simpler than this:
>>> df = pd.DataFrame({'a': np.arange(-5, 2), 'b': np.arange(-5, 2), 'c': np.arange(-5, 2), 'd': np.arange(-5, 2), 'e': np.arange(-5, 2)})
>>> df
a b c d e
0 -5 -5 -5 -5 -5
1 -4 -4 -4 -4 -4
2 -3 -3 -3 -3 -3
3 -2 -2 -2 -2 -2
4 -1 -1 -1 -1 -1
5 0 0 0 0 0
6 1 1 1 1 1
>>> df[df[cols] < 0] = np.nan
>>> df
a b c d e
0 NaN NaN NaN -5 -5
1 NaN NaN NaN -4 -4
2 NaN NaN NaN -3 -3
3 NaN NaN NaN -2 -2
4 NaN NaN NaN -1 -1
5 0.0 0.0 0.0 0 0
6 1.0 1.0 1.0 1 1
Upvotes: 9
Reputation: 2104
If it has to be a one-liner:
df[['a', 'b', 'c']] = df[['a', 'b', 'c']].apply(lambda c: [x>0 and x or np.nan for x in c])
Upvotes: 1
Reputation: 394159
You can use np.where
to achieve this:
In [47]:
df = pd.DataFrame(np.random.randn(5,5), columns=list('abcde'))
df
Out[47]:
a b c d e
0 0.616829 -0.933365 -0.735308 0.665297 -1.333547
1 0.069158 2.266290 -0.068686 -0.787980 -0.082090
2 1.203311 1.661110 -1.227530 -1.625526 0.045932
3 -0.247134 -1.134400 0.355436 0.787232 -0.474243
4 0.131774 0.349103 -0.632660 -1.549563 1.196455
In [48]:
df[['a','b','c']] = np.where(df[['a','b','c']] < 0, np.NaN, df[['a','b','c']])
df
Out[48]:
a b c d e
0 0.616829 NaN NaN 0.665297 -1.333547
1 0.069158 2.266290 NaN -0.787980 -0.082090
2 1.203311 1.661110 NaN -1.625526 0.045932
3 NaN NaN 0.355436 0.787232 -0.474243
4 0.131774 0.349103 NaN -1.549563 1.196455
Upvotes: 3
Reputation:
Here's a way:
df[df.columns.isin(['a', 'b', 'c']) & (df < 0)] = np.nan
Upvotes: 5