Reputation: 937

apply function pandas dataframe

I'm trying to apply a simple function to a pandas dataframe. I want to achieve a variable called "target", from the formula defined in "my_res", and add it to the dataframe

import pandas as pd
df = pd.DataFrame({'ID':['1','2','3'], 'v1': [0,2,3], 'v2':[1,4,5], 'v3':[11,43,52]})
print df


def my_res (x,y,z):
    target=(x*z)/y
    return target


df['target'] = df.apply(my_res('v1','v2','v3'),axis=1)
print df

and what if I had a formula like this:

def my_res (x,y,z):
    target=(x*z)/y
    check=target-z
    return target

#in this case I want to create 2 variable in the df

Upvotes: 1

Answers (2)

jezrael

Reputation: 863651

You can use lambda with column names:

df['target'] = df.apply(lambda x: my_res(x.v1,x.v2,x.v3),axis=1) 
print (df)
  ID  v1  v2  v3  target
0  1   0   1  11     0.0
1  2   2   4  43    21.5
2  3   3   5  52    31.2

But better and faster is use vectorized solutions with mul, div and sub:

df['target'] = df.v1 * df.v3 /df.v2
print (df)
  ID  v1  v2  v3  target
0  1   0   1  11     0.0
1  2   2   4  43    21.5
2  3   3   5  52    31.2

df['target'] = df.v1.mul(df.v3).div(df.v2)
print (df)
  ID  v1  v2  v3  target
0  1   0   1  11     0.0
1  2   2   4  43    21.5
2  3   3   5  52    31.2

Timings:

def my_res (x,y,z): 
    target=(x*z)/y 
    return target

#[30000 rows x 4 columns]    
df = pd.concat([df]*10000).reset_index(drop=True)
df['target'] = df.v1.mul(df.v3).div(df.v2)
df['target1'] = df.apply(lambda x: my_res(x.v1,x.v2,x.v3),axis=1) 
print (df)

In [290]: %timeit df.v1.mul(df.v3).div(df.v2)
1000 loops, best of 3: 305 µs per loop

In [291]: %timeit df.apply(lambda x: my_res(x.v1,x.v2,x.v3),axis=1)
1 loop, best of 3: 1.66 s per loop

In [292]: %timeit df.v1 * df.v3 / df.v2
1000 loops, best of 3: 562 µs per loop

Upvotes: 1

Ted Petrou

Reputation: 62037

There is no reason to use apply here. A simple vectorized operation will work.

df.v1 * df.v3 / df.v2

Upvotes: 0

apply function pandas dataframe

Answers (2)

Related Questions