Optimization of a for loop with pandas

Question

This is a problem that I found recently and I enjoy very much to design a solution. I think that it is a nice exercise and a good example of the combination of for loops and pandas.

I have the following pandas data frame, with three columns: ['runs','Period_change', 'Close', 'sum'].

I want to implement an alternative to the following for loop to speed up the program.

import pandas as pd

game = pd.read_csv('test.csv').set_index('timestamp').dropna()
for l in game['runs'].unique():
    game['sum'][game['runs'] == l] = game[game['runs'] == l]['Period_change'].sum()/game[game['runs'] == l]['Close'].iloc[0]

BENY · Accepted Answer

IIUC, your for loop can be replace by groupby

df['sum']=df.name.map(df.groupby('name').apply(lambda x : x['Period_change'].sum()/x['Close'].iloc[0]))

Optimization of a for loop with pandas

Answers (1)

Related Questions