Reputation: 21
The basic idea is that I have a computation that involves multiple columns from a dataframe and returns multiple columns, which I'd like to integrate in the dataframe. I'd like to do something like this:
df = pd.DataFrame({'id':['i1', 'i1', 'i2', 'i2'], 'a':[1,2,3,4], 'b':[5,6,7,8]})
def custom_f(a, b):
computation = a+b
return computation + 1, computation*2
df['c1'], df['c2'] = df.groupby('id').apply(lambda x: custom_f(x.a, x.b))
Desired output:
id a b c1 c2
0 i1 1 5 7 12
1 i1 2 6 9 16
2 i2 3 7 11 20
3 i2 4 8 13 24
I know how I could do this one column at a time, but in reality the 'computation' operation using the two columns is quite expensive so I'm trying to figure out how I could only run it once.
EDIT: I realised that the given example can be solved without the groupby, but for my use case for the actual 'computation' I'm doing the groupby because I'm using the first and last values of arrays in each group for my computation. For the sake of simplicity I omitted that, but imagine that it is needed.
Upvotes: 2
Views: 2982
Reputation: 24314
you can try:
def custom_f(a, b):
computation = a+b
return pd.concat([(computation + 1),(computation*2)],axis=1)
Finally:
df[['c1','c2']]=df.groupby('id').apply(lambda x: custom_f(x.a, x.b)).values
output of df
:
id a b c1 c2
0 i1 1 5 7 12
1 i1 2 6 9 16
2 i2 3 7 11 20
3 i2 4 8 13 24
Upvotes: 2
Reputation: 724
df['c1'], df['c2'] = custom_f(df['a'], df['b']) # you dont need apply for your desired output here
Upvotes: 1