Is there a way to speed up looping in python

Question

I have a set of players (common_opps), the size of which changes over time, I need to count values from 3d dataframe (df_versus) and then return the mean. I am repeating this function many times and the execution time raises every time. It is Ok for small amount of players, but it comes to the moment when this loop is iterating by over 500 players and the waiting time is very long. So I was wondering if there is a way to improve this, by changing this loop to something else like lambda functions or something. I tried Numba, but can't configure it properly.

def common_opponents(p1, p2):
    
    common_opps = np.array(s_opponents[p1].intersection(s_opponents[p2]))
    serve1, serve2, ace1, ace2, df1, df2, break1, break2 = 0, 0, 0, 0, 0, 0, 0, 0

    length = len(common_opps)
    if length == 0:
        return serve1, serve2, ace1, ace2, df1, df2, break1, break2
    
    for opponent in common_opps:

        serve1 += df_versus[p1][opponent]["serve_won"] / df_versus[p1][opponent]["serve_total"]
        serve2 += df_versus[p2][opponent]["serve_won"] / df_versus[p2][opponent]["serve_total"]

        ace1 += df_versus[p1][opponent]["ace"] / df_versus[p1][opponent]["serve_total"]
        ace2 += df_versus[p2][opponent]["ace"] / df_versus[p2][opponent]["serve_total"]

        df1 += df_versus[p1][opponent]["df"] / df_versus[p1][opponent]["serve_total"]
        df2 += df_versus[p2][opponent]["df"] / df_versus[p2][opponent]["serve_total"]

        break1 += df_versus[p1][opponent]["break_won"] / df_versus[p1][opponent]["break_total"]
        break2 += df_versus[p2][opponent]["break_won"] / df_versus[p2][opponent]["break_total"]
        
    return (serve1/length, serve2/length, ace1/length, ace2/length, 
            df1/length, df2/length, break1/length, break2/length)

(p1) and (p2) are names of the players in string like 'Roger Federer' and 'Rafael Nadal', (s_opponents1) and (s_opponents2) are sets with players names, (common_opps) is also a set with names, (df_versus) is a multiindex data frame made with

versus_index = pd.MultiIndex.from_product([unique_player, ["serve_won", "serve_total", "ace", "df", "break_won", "break_total", "won", "lost"]]) 


df_versus = pd.DataFrame(0, index=versus_index, columns=unique_player)

it is filled over time with proper values and unique_players is a list of unique players in whole dataset

If Nadal and Federer would have only those 3 players I would consider that dataframe, of course the zeros should be replaced by their stats

Is there a way to speed up looping in python

Answers (1)

Related Questions