How to improve the speed of groupby/transform?

Question

I want to implement the groupmax function, which finds the max value within each group, and assign it back to the rows within each group. It seems groupby(name).transform(max) is what I need. E.g.

In [1]: print df
  name     value
0    0  0.363030
1    0  0.324828
2    0  0.499279
3    1  0.799836
4    1  0.886653
5    1  0.335056

In [2]: print df.groupby('name').transform(max)
      value
0  0.499279
1  0.499279
2  0.499279
3  0.886653
4  0.886653
5  0.886653

However this approach is very slow when the size of the data frame becomes large and there are many small groups. E.g. the following code will hang there forever

df = pd.DataFrame({'name' : repeat([str(x) for x in range(0, 1000000)], 2), 'value' : rand(2000000)})
print df.groupby('name').transform(max)

I wonder if there is any fast solutions for this problem?

Thanks a lot!

How to improve the speed of groupby/transform?

Answers (1)

Related Questions