Reputation: 163
Suppose I have pandas dataframe, where first column is threshold:
threshold,value1,value2,value3,...,valueN
5,12,3,4,...,20
4,1,7,8,...,3
7,5,2,8,...,10
And for each row I want set elements in columns value1..valueN
to zero if it less then threshold
:
threshold,value1,value2,value3,...,valueN
5,12,0,0,...,20
4,0,7,8,...,0
7,0,0,8,...,10
How can I do this without explicit for
loops?
Upvotes: 3
Views: 992
Reputation: 863801
Use DataFrame.lt
for compare with mask
:
df = df.mask(df.lt(df['threshold'], axis=0), 0)
Orset_index
and reset_index
:
df = df.set_index('threshold')
df = df.mask(df.lt(df.index, axis=0), 0).reset_index()
For improve performance numpy solution
:
arr = df.values
df = pd.DataFrame(np.where(arr < arr[:, 0][:, None], 0, arr), columns=df.columns)
print (df)
threshold value1 value2 value3 valueN
0 5 12 0 0 20
1 4 0 7 8 0
2 7 0 0 8 10
Timings:
In [294]: %timeit set_reset_sol(df)
1 loop, best of 3: 376 ms per loop
In [295]: %timeit numpy_sol(df)
10 loops, best of 3: 59.9 ms per loop
In [296]: %timeit df.mask(df.lt(df['threshold'], axis=0), 0)
1 loop, best of 3: 380 ms per loop
In [297]: %timeit df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: np.where(x > df.threshold, x, 0), axis=0)
1 loop, best of 3: 449 ms per loop
np.random.seed(234)
N = 100000
#[100000 rows x 100 columns]
df = pd.DataFrame(np.random.randint(100, size=(N, 100)))
df.columns = ['threshold'] + df.columns[1:].tolist()
print (df)
def set_reset_sol(df):
df = df.set_index('threshold')
return df.mask(df.lt(df.index, axis=0), 0).reset_index()
def numpy_sol(df):
arr = df.values
return pd.DataFrame(np.where(arr < arr[:, 0][:, None], 0, arr), columns=df.columns)
Upvotes: 2
Reputation: 12417
You can try in this way:
df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: np.where(x > df.threshold, x, 0), axis=0)
Upvotes: 2