Reputation: 937
I'm trying to apply a simple function to a pandas dataframe. I want to achieve a variable called "target", from the formula defined in "my_res", and add it to the dataframe
import pandas as pd
df = pd.DataFrame({'ID':['1','2','3'], 'v1': [0,2,3], 'v2':[1,4,5], 'v3':[11,43,52]})
print df
def my_res (x,y,z):
target=(x*z)/y
return target
df['target'] = df.apply(my_res('v1','v2','v3'),axis=1)
print df
and what if I had a formula like this:
def my_res (x,y,z):
target=(x*z)/y
check=target-z
return target
#in this case I want to create 2 variable in the df
Upvotes: 1
Views: 628
Reputation: 863651
You can use lambda
with column names:
df['target'] = df.apply(lambda x: my_res(x.v1,x.v2,x.v3),axis=1)
print (df)
ID v1 v2 v3 target
0 1 0 1 11 0.0
1 2 2 4 43 21.5
2 3 3 5 52 31.2
But better and faster is use vectorized solutions with mul
, div
and sub
:
df['target'] = df.v1 * df.v3 /df.v2
print (df)
ID v1 v2 v3 target
0 1 0 1 11 0.0
1 2 2 4 43 21.5
2 3 3 5 52 31.2
df['target'] = df.v1.mul(df.v3).div(df.v2)
print (df)
ID v1 v2 v3 target
0 1 0 1 11 0.0
1 2 2 4 43 21.5
2 3 3 5 52 31.2
Timings:
def my_res (x,y,z):
target=(x*z)/y
return target
#[30000 rows x 4 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
df['target'] = df.v1.mul(df.v3).div(df.v2)
df['target1'] = df.apply(lambda x: my_res(x.v1,x.v2,x.v3),axis=1)
print (df)
In [290]: %timeit df.v1.mul(df.v3).div(df.v2)
1000 loops, best of 3: 305 µs per loop
In [291]: %timeit df.apply(lambda x: my_res(x.v1,x.v2,x.v3),axis=1)
1 loop, best of 3: 1.66 s per loop
In [292]: %timeit df.v1 * df.v3 / df.v2
1000 loops, best of 3: 562 µs per loop
Upvotes: 1
Reputation: 62037
There is no reason to use apply here. A simple vectorized operation will work.
df.v1 * df.v3 / df.v2
Upvotes: 0