Reputation: 953
I am new to Python and I am not sure how to solve the following problem.
I have a function:
def EOQ(D,p,ck,ch):
Q = math.sqrt((2*D*ck)/(ch*p))
return Q
Say I have the dataframe
df = pd.DataFrame({"D": [10,20,30], "p": [20, 30, 10]})
D p
0 10 20
1 20 30
2 30 10
ch=0.2
ck=5
And ch
and ck
are float types. Now I want to apply the formula to every row on the dataframe and return it as an extra row 'Q'. An example (that does not work) would be:
df['Q']= map(lambda p, D: EOQ(D,p,ck,ch),df['p'], df['D'])
(returns only 'map' types)
I will need this type of processing more in my project and I hope to find something that works.
Upvotes: 94
Views: 158195
Reputation:
There are few more ways to apply a function on every row of a DataFrame.
(1) You could modify EOQ
a bit by letting it accept a row (a Series object) as argument and access the relevant elements using the column names inside the function. Moreover, you can pass arguments to apply
using its keyword, e.g. ch
or ck
:
def EOQ1(row, ck, ch):
Q = math.sqrt((2*row['D']*ck)/(ch*row['p']))
return Q
df['Q1'] = df.apply(EOQ1, ck=ck, ch=ch, axis=1)
(2) It turns out that apply
is often slower than a list comprehension (in the benchmark below, it's 20x slower). To use a list comprehension, you could modify EOQ
still further so that you access elements by its index. Then call the function in a loop over df
rows that are converted to lists:
def EOQ2(row, ck, ch):
Q = math.sqrt((2*row[0]*ck)/(ch*row[1]))
return Q
df['Q2a'] = [EOQ2(x, ck, ch) for x in df[['D','p']].to_numpy().tolist()]
(3) As it happens, if the goal is to call a function iteratively, map
is usually faster than a list comprehension. So you could convert df
into a list, map
the function to it; then unpack the result in a list:
df['Q2b'] = [*map(EOQ2, df[['D','p']].to_numpy().tolist(), [ck]*len(df), [ch]*len(df))]
(4) As @EdChum notes, it's always better to use vectorized methods if it's possible to do so, instead of applying a function row by row. Pandas offers vectorized methods that rival that of numpy's. In the case of EOQ
for example, instead of math.sqrt
, you could use pandas' pow
method (in the benchmark below, using pandas vectorized methods is ~20% faster than using numpy):
df['Q_pd'] = df['D'].mul(2*ck).div(ch*df['p']).pow(0.5)
Output:
D p Q Q_np Q1 Q2a Q2b Q_pd
0 10 20 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
1 20 30 5.773503 5.773503 5.773503 5.773503 5.773503 5.773503
2 30 10 12.247449 12.247449 12.247449 12.247449 12.247449 12.247449
Timings:
df = pd.DataFrame({"D": [10,20,30], "p": [20, 30, 10]})
df = pd.concat([df]*10000)
>>> %timeit df['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
623 ms ± 22.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df['Q1'] = df.apply(EOQ1, ck=ck, ch=ch, axis=1)
615 ms ± 39.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df['Q2a'] = [EOQ2(x, ck, ch) for x in df[['D','p']].to_numpy().tolist()]
31.3 ms ± 479 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df['Q2b'] = [*map(EOQ2, df[['D','p']].to_numpy().tolist(), [ck]*len(df), [ch]*len(df))]
26.9 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df['Q_np'] = np.sqrt((2*df['D']*ck)/(ch*df['p']))
1.19 ms ± 53.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['Q_pd'] = df['D'].mul(2*ck).div(ch*df['p']).pow(0.5)
966 µs ± 27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 9
Reputation: 394439
The following should work:
def EOQ(D,p,ck,ch):
Q = math.sqrt((2*D*ck)/(ch*p))
return Q
ch=0.2
ck=5
df['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
df
If all you're doing is calculating the square root of some result then use the np.sqrt
method this is vectorised and will be significantly faster:
In [80]:
df['Q'] = np.sqrt((2*df['D']*ck)/(ch*df['p']))
df
Out[80]:
D p Q
0 10 20 5.000000
1 20 30 5.773503
2 30 10 12.247449
Timings
For a 30k row df:
In [92]:
import math
ch=0.2
ck=5
def EOQ(D,p,ck,ch):
Q = math.sqrt((2*D*ck)/(ch*p))
return Q
%timeit np.sqrt((2*df['D']*ck)/(ch*df['p']))
%timeit df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
1000 loops, best of 3: 622 µs per loop
1 loops, best of 3: 1.19 s per loop
You can see that the np method is ~1900 X faster
Upvotes: 125