Pandas apply and lambda function efficiency

Question

I have the following data frame

data = 
     a     b     c    val
0   'a1'  'b1'  'c1'   5
1   'a1'  'b1'  'c2'   10
2   'a1'  'b2'  'c3'   1 
3   'a2'  'b1'  'c4'   30
4   'a2'  'b1'  'c5'   20
5   'a2'  'b1'  'c6'   10

I want to create a new data frame with same columns, where the value of a given (a,b,c) is equal to sum of the values of all (aa,bb, cc)'s if a==aa, b==bb and value(aa,bb,cc) >= value(a,b,c). Here aa is an alias for a, and so on.

The desired outcome is

data_new = 
     a     b     c    val
0   'a1'  'b1'  'c1'   15
1   'a1'  'b1'  'c2'   10
2   'a1'  'b2'  'c3'   1 
3   'a2'  'b1'  'c4'   30
4   'a2'  'b1'  'c5'   50
5   'a2'  'b1'  'c6'   60

And I do the following:

data['key'] = data['a']+data['b']
data_new = data
data_new = data.apply(lambda row: data[ (data.key==row.key) 
                                      & (data.val>=row.val) ].val.sum(), axis=1)
del data_new['key']

Actually, in my real data, I have more columns like a and b, that is why boolean comparison such as

(data.a==row.a) & (data.a==row.b) & ...

is slower. The thing is that even the current implementation is not as fast as I want. I have huge tables. Is there any way to do the same thing faster & more efficient?

Pandas apply and lambda function efficiency

Answers (1)

Related Questions