Reputation: 954
I want to change all values less than 5 in the following df with nan, but column B should be excluded from the operation without dropping it.
A B C D
DateTime
2016-03-03 05:45:00 1 2 3 4
2016-03-03 06:00:00 1 2 3 4
2016-03-03 06:15:00 1 2 3 4
2016-03-03 06:30:00 1 2 3 4
2016-03-03 06:45:00 1 2 3 4
desired result
A B C D
DateTime
2016-03-03 05:45:00 NaN 2 NaN NaN
2016-03-03 06:00:00 NaN 2 NaN NaN
2016-03-03 06:15:00 NaN 2 NaN NaN
2016-03-03 06:30:00 NaN 2 NaN NaN
2016-03-03 06:45:00 NaN 2 NaN NaN
I can take colum B out of the df then apply df[df < 5] = np.nan
to the remaining df, then combine them again. Dropping column B before the operation can also be another approach. But I am looking for a more efficient way, one liner if posible.
Trying df[df.columns.difference(['B']) < 5] = np.nan
, but it is not correct. Also df[(df.B != 'Other') < 5] = np.nan
without a success.
Upvotes: 2
Views: 295
Reputation: 88226
Working from your code, you can do instead:
mask = (df.loc[:,df.columns.difference(['B']).tolist()] < 5).any()
df[mask[mask].index] = np.nan
Note that df.columns.difference(['B'])
is a list of columns excluding B
. So it doesn't make sense to see which are < 5
. You firstly have to slice the dataframe with these columns to then check the consition. Finally you have to add any
to check if there is at least a True
.
Upvotes: 2
Reputation: 402263
Let's use a more sensible example:
A B C D
DateTime
2016-03-03 05:45:00 1 2 3 4
2016-03-03 06:00:00 1 2 3 10
2016-03-03 06:15:00 1 2 6 4
2016-03-03 06:30:00 1 2 3 4
2016-03-03 06:45:00 1 2 6 10
df.loc[:, df.columns.difference(['B'])] = df[df >= 5]
df
A B C D
DateTime
2016-03-03 05:45:00 NaN 2 NaN NaN
2016-03-03 06:00:00 NaN 2 NaN 10.0
2016-03-03 06:15:00 NaN 2 6.0 NaN
2016-03-03 06:30:00 NaN 2 NaN NaN
2016-03-03 06:45:00 NaN 2 6.0 10.0
This masks everything, but only assigns based on loc
.
Another option is masking with update
:
v = df[df >= 5]
v.update(df[['B']])
A B C D
DateTime
2016-03-03 05:45:00 NaN 2.0 NaN NaN
2016-03-03 06:00:00 NaN 2.0 NaN 10.0
2016-03-03 06:15:00 NaN 2.0 6.0 NaN
2016-03-03 06:30:00 NaN 2.0 NaN NaN
2016-03-03 06:45:00 NaN 2.0 6.0 10.0
Upvotes: 3
Reputation: 332
You can do this simply by slicing down the columns
import pandas as pd
import numpy as np
df = pd.DataFrame({l:range(10) for l in 'ABCDEFGH'})
dont_change=['B']
cols = [col for col in df.columns if col not in dont_change]
df_sel = df.loc[:,cols] # select correct columns
df_sel[df_sel<5]=np.nan # modify
df[cols]=df_sel #reassign
Upvotes: 1
Reputation: 323226
You may using mask
df.mask(df.lt(5)).combine_first(df[['B']])
Out[258]:
A B C D
DateTime
2016-03-0305:45:00 NaN 2.0 NaN NaN
2016-03-0306:00:00 NaN 2.0 NaN NaN
2016-03-0306:15:00 NaN 2.0 NaN NaN
2016-03-0306:30:00 NaN 2.0 NaN NaN
2016-03-0306:45:00 NaN 2.0 NaN NaN
Upvotes: 1