what is an efficient way of applying a function to one column in a group from groupby object?

Question

I have a dataframe which has 500K rows in it.

I have following columns:

               Symbol      Open      High       Low    Close    Volume

Date                                                                    
01-Aug-2017    AADR   49.8800    49.8800    49.8800    49.8800     790
02-Aug-2017    AADR   49.8432    49.8432    49.8432    49.8432     684

I have 2071 symbols in the dataframe:

>>> grouped = df.groupby('Symbol')

>>> len(grouped)

 2071

I wanted to apply a rolling mean function only on one column (i.e. Close) of each group and add the mean values as an extra column in existing dataframe.

I believe I could do following:

results = {}
for name, group in grouped:
    ma_col = group[1].Close.ewm(span=10, min_periods=10).mean()
    results[name] = ma_col

this gives me dictionary of results which I could then turn into a DataFrame to use.

Is there a more efficient (better performance) way to do the same thing?

cs95 · Accepted Answer

You can use groupby + transform -

df.groupby('Symbol').Close.transform(lambda x: x.ewm(span=10, min_periods=10).mean())

what is an efficient way of applying a function to one column in a group from groupby object?

Answers (1)

Related Questions