Reputation: 880
I would like to apply a function f1 by group to a dataframe:
import pandas as pd
import numpy as np
data = np.array([['id1','id2','u','v0','v1'],
['A','A',10,1,7],
['A','A',10,2,8],
['A','B',20,3,9],
['B','A',10,4,10],
['B','B',30,5,11],
['B','B',30,6,12]])
z = pd.DataFrame(data = data[1:,:], columns=data[0,:])
def f1(u,v):
return u*np.cumprod(v)
The result of the function depends on the column u and columns v0 or v1 (that can be thousands of v ecause I'm doing a simulation on a lot of paths).
The result should be like this
id1 id2 new_v0 new_v1
0 A A 10 70
1 A A 20 560
2 A B 60 180
3 B A 40 100
4 B B 150 330
5 B B 900 3960
I tried for a start
output = z.groupby(['id1', 'id2']).apply(lambda x: f1(u = x.u,v =x.v0))
but I can't even get a result with just one column.
Thank you very much!
Upvotes: 2
Views: 175
Reputation: 19947
If you want to handle more than 2 v columns, it's better not to reference it.
(
z.apply(lambda x: pd.to_numeric(x, errors='ignore'))
.groupby(['id1', 'id2']).apply(lambda x: x.cumprod().mul(x.u.min()))
)
Upvotes: 0
Reputation: 75080
You can filter column names starting with v
and create a list and pass them under groupby:
v_cols = z.columns[z.columns.str.startswith('v')].tolist()
z[['u']+v_cols] = z[['u']+v_cols].apply(pd.to_numeric)
out = z.assign(**z.groupby(['id1','id2'])[v_cols].cumprod()
.mul(z['u'],axis=0).add_prefix('new_'))
print(out)
id1 id2 u v0 v1 new_v0 new_v1
0 A A 10 1 7 10 70
1 A A 10 2 8 20 560
2 A B 20 3 9 60 180
3 B A 10 4 10 40 100
4 B B 30 5 11 150 330
5 B B 30 6 12 900 3960
Upvotes: 2
Reputation: 323226
The way you create your data frame , will make the numeric
to object
, we convert first , then use the groupby
+ cumprod
z[['u','v0','v1']]=z[['u','v0','v1']].apply(pd.to_numeric)
s=z.groupby(['id1','id2'])[['v0','v1']].cumprod().mul(z['u'],0)
#z=z.join(s.add_prefix('New_'))
v0 v1
0 10 70
1 20 560
2 60 180
3 40 100
4 150 330
5 900 3960
Upvotes: 2