k.ko3n
k.ko3n

Reputation: 954

Excluding a column from an operation without dropping it

I want to change all values less than 5 in the following df with nan, but column B should be excluded from the operation without dropping it.

                    A   B   C   D
DateTime                
2016-03-03 05:45:00 1   2   3   4
2016-03-03 06:00:00 1   2   3   4
2016-03-03 06:15:00 1   2   3   4
2016-03-03 06:30:00 1   2   3   4
2016-03-03 06:45:00 1   2   3   4

desired result

                    A   B   C   D
DateTime                
2016-03-03 05:45:00 NaN 2   NaN NaN
2016-03-03 06:00:00 NaN 2   NaN NaN
2016-03-03 06:15:00 NaN 2   NaN NaN
2016-03-03 06:30:00 NaN 2   NaN NaN
2016-03-03 06:45:00 NaN 2   NaN NaN

I can take colum B out of the df then apply df[df < 5] = np.nan to the remaining df, then combine them again. Dropping column B before the operation can also be another approach. But I am looking for a more efficient way, one liner if posible. Trying df[df.columns.difference(['B']) < 5] = np.nan, but it is not correct. Also df[(df.B != 'Other') < 5] = np.nan without a success.

Upvotes: 2

Views: 295

Answers (5)

yatu
yatu

Reputation: 88226

Working from your code, you can do instead:

mask = (df.loc[:,df.columns.difference(['B']).tolist()] < 5).any()
df[mask[mask].index] = np.nan

Note that df.columns.difference(['B']) is a list of columns excluding B. So it doesn't make sense to see which are < 5. You firstly have to slice the dataframe with these columns to then check the consition. Finally you have to add any to check if there is at least a True.

Upvotes: 2

cs95
cs95

Reputation: 402263

Let's use a more sensible example:

                     A  B  C   D
DateTime                        
2016-03-03 05:45:00  1  2  3   4
2016-03-03 06:00:00  1  2  3  10
2016-03-03 06:15:00  1  2  6   4
2016-03-03 06:30:00  1  2  3   4
2016-03-03 06:45:00  1  2  6  10

df.loc[:, df.columns.difference(['B'])] = df[df >= 5] 
df
                      A  B    C     D
DateTime                             
2016-03-03 05:45:00 NaN  2  NaN   NaN
2016-03-03 06:00:00 NaN  2  NaN  10.0
2016-03-03 06:15:00 NaN  2  6.0   NaN
2016-03-03 06:30:00 NaN  2  NaN   NaN
2016-03-03 06:45:00 NaN  2  6.0  10.0

This masks everything, but only assigns based on loc.


Another option is masking with update:

v = df[df >= 5]
v.update(df[['B']])

                      A    B    C     D
DateTime                               
2016-03-03 05:45:00 NaN  2.0  NaN   NaN
2016-03-03 06:00:00 NaN  2.0  NaN  10.0
2016-03-03 06:15:00 NaN  2.0  6.0   NaN
2016-03-03 06:30:00 NaN  2.0  NaN   NaN
2016-03-03 06:45:00 NaN  2.0  6.0  10.0

Upvotes: 3

James Fulton
James Fulton

Reputation: 332

You can do this simply by slicing down the columns

import pandas as pd
import numpy as np
df = pd.DataFrame({l:range(10) for l in 'ABCDEFGH'})

dont_change=['B']

cols = [col for col in df.columns if col not in dont_change]

df_sel = df.loc[:,cols] # select correct columns
df_sel[df_sel<5]=np.nan # modify
df[cols]=df_sel #reassign

Upvotes: 1

Ricky Kim
Ricky Kim

Reputation: 2022

df[df[df.columns.difference(['B'])]<5]=np.nan

Upvotes: 1

BENY
BENY

Reputation: 323226

You may using mask

df.mask(df.lt(5)).combine_first(df[['B']])

Out[258]: 
                     A    B   C   D
DateTime                           
2016-03-0305:45:00 NaN  2.0 NaN NaN
2016-03-0306:00:00 NaN  2.0 NaN NaN
2016-03-0306:15:00 NaN  2.0 NaN NaN
2016-03-0306:30:00 NaN  2.0 NaN NaN
2016-03-0306:45:00 NaN  2.0 NaN NaN

Upvotes: 1

Related Questions