Reputation: 959
I need to null values in several columns where they are less in absolute value than correspond values in the threshold column
import pandas as pd
import numpy as np
df=pd.DataFrame({'key1': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'key2': [2000, 2001, 2002, 2001, 2002],
'data1': np.random.randn(5),
'data2': np.random.randn(5),
'threshold': [0.5,0.4,0.6,0.1,0.2]}).set_index(['key1','key2'])
data1 data2 threshold
key1 key2
Ohio 2000 0.201240 0.083833 0.5
2001 -1.993489 -1.081208 0.4
2002 0.759038 -1.688769 0.6
Nevada 2001 -0.543916 1.412679 0.1
2002 -1.545781 0.181224 0.2
this gives me an error "cannot join with no level specified and no overlapping names" df.where(df.abs()>df['threshold'])
this works but obviously against a scalar df.where(df.abs()>0.5)
data1 data2 threshold
key1 key2
Ohio 2000 NaN NaN NaN
2001 -1.993489 -1.081208 NaN
2002 0.759038 -1.688769 NaN
Nevada 2001 -0.543916 1.412679 NaN
2002 -1.545781 NaN NaN
BTW, this does appear to be giving me an OK result - still want to find out how to do it with where() method
df.apply(lambda x:x.where(x.abs()>x['threshold']),axis=1)
Upvotes: 3
Views: 2642
Reputation: 60060
Here's a slightly different option using the DataFrame.gt
(greater than) method.
df[df.abs().gt(df['threshold'], axis='rows')]
Out[16]:
# Output might not look the same because of different random numbers,
# use np.random.seed() for reproducible random number gen
Out[13]:
data1 data2 threshold
key1 key2
Ohio 2000 NaN NaN NaN
2001 1.954543 1.372174 NaN
2002 NaN NaN NaN
Nevada 2001 0.275814 0.854617 NaN
2002 NaN 0.204993 NaN
Upvotes: 3