koch_kir
koch_kir

Reputation: 163

Apply function to dataframe row in pandas based on value in specific column

Suppose I have pandas dataframe, where first column is threshold:

threshold,value1,value2,value3,...,valueN
5,12,3,4,...,20
4,1,7,8,...,3
7,5,2,8,...,10

And for each row I want set elements in columns value1..valueN to zero if it less then threshold:

threshold,value1,value2,value3,...,valueN
5,12,0,0,...,20
4,0,7,8,...,0
7,0,0,8,...,10

How can I do this without explicit for loops?

Upvotes: 3

Views: 992

Answers (2)

jezrael
jezrael

Reputation: 863801

Use DataFrame.lt for compare with mask:

df = df.mask(df.lt(df['threshold'], axis=0), 0)

Orset_index and reset_index:

df = df.set_index('threshold')
df = df.mask(df.lt(df.index, axis=0), 0).reset_index()

For improve performance numpy solution:

arr = df.values
df = pd.DataFrame(np.where(arr < arr[:, 0][:, None], 0, arr), columns=df.columns)

print (df)
   threshold  value1  value2  value3  valueN
0          5      12       0       0      20
1          4       0       7       8       0
2          7       0       0       8      10

Timings:

In [294]: %timeit set_reset_sol(df)
1 loop, best of 3: 376 ms per loop

In [295]: %timeit numpy_sol(df)
10 loops, best of 3: 59.9 ms per loop

In [296]: %timeit df.mask(df.lt(df['threshold'], axis=0), 0)
1 loop, best of 3: 380 ms per loop

In [297]: %timeit df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: np.where(x > df.threshold, x, 0), axis=0)
1 loop, best of 3: 449 ms per loop


np.random.seed(234)
N = 100000

#[100000 rows x 100 columns] 
df = pd.DataFrame(np.random.randint(100, size=(N, 100)))
df.columns = ['threshold'] + df.columns[1:].tolist()
print (df)

def set_reset_sol(df):
    df = df.set_index('threshold')
    return df.mask(df.lt(df.index, axis=0), 0).reset_index()

def numpy_sol(df):
    arr = df.values
    return pd.DataFrame(np.where(arr < arr[:, 0][:, None], 0, arr), columns=df.columns)

Upvotes: 2

Joe
Joe

Reputation: 12417

You can try in this way:

df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: np.where(x > df.threshold, x, 0), axis=0)

Upvotes: 2

Related Questions