Reputation: 511
I have a pandas DataFrame with columns idx
, grp
, X
, Y
, and I want to get a new column with the cumulative integral of a function of Y
with respect to X
. However, I want to apply this cumulative integration to each subgroup of the DataFrame as defined by the column grp
.
Here's what I'm doing:
import numpy as np
import pandas as pd
from scipy import integrate
def myIntegral(DF, n):
A0 = 200
return integrate.cumtrapz((A0/DF.Y)**n, DF.X, initial=0)
data = pd.DataFrame({'idx' : [1,2,3,4,5,6],
'grp' : [2,2,2,2,3,3],
'X' : [.1,.2,.3,.4,.2,.3],
'Y' : [3,4,4,3,2,3]}
)
data.sort_values(by=['grp', 'X'], inplace=True)
out = data.groupby('grp').apply(myIntegral, n=0.5)
out
is a Series of ndarrays for each value of grp
, which I need to map back into the DataFrame:
data_grouped = data.groupby('grp')
out2 = []
for grp, DF in data_grouped:
DF['Z'] = out.loc[grp]
out2.append(DF)
data = pd.concat(out2)
It works but the step via a Series of ndarrays seems really ugly and prone to error. Suggestions how to improve this? Also, the data sets I'll be working with are rather big, so I am trying to find an efficient solution.
Thanks!
Upvotes: 3
Views: 1103
Reputation: 863361
You can change your function for create new column and return back DF
like:
def myIntegral(DF, n):
A0 = 200
DF['new'] = integrate.cumtrapz((A0/DF.Y)**n, DF.X, initial=0)
return DF
data = pd.DataFrame({'idx' : [1,2,3,4,5,6],
'grp' : [2,2,2,2,3,3],
'X' : [.1,.2,.3,.4,.2,.3],
'Y' : [3,4,4,3,2,3]}
)
data.sort_values(by=['grp', 'X'], inplace=True)
out = data.groupby('grp').apply(myIntegral, n=0.5)
print (out)
idx grp X Y new
0 1 2 0.1 3 0.000000
1 2 2 0.2 4 0.761802
2 3 2 0.3 4 1.468908
3 4 2 0.4 3 2.230710
4 5 3 0.2 2 0.000000
5 6 3 0.3 3 0.908248
Upvotes: 3