Optimizing looping and calculating through pandas dataframe that has mutiple if-statements

Question

So i have a csv-file with three columns x,y and rank like this:

x,y,rank
153271,30622,12
143400,125936,6
153718,31606,1
19680,168350,2
99291,47206,19
39208,112928,1
97469,169684,17
...

What I want to do is to iterate through this csv file and find all the x and y coordinates that have rank value of 1. Then "draw" a circle around every coordinate that has rank of 1 and calculate how many values fall inside this circle. Then I want to calculate all the values that have rank value of "1" inside this circle. This way I can calculate a percentage of how many "rank" "1" values are inside these circles compared to other "rank" values.

I have managed to exactly this with the code below but it's super slow. It takes about 3 second to print the WinPercentage for each row with "rank" "1". The dataset is quite large however. It has 320 000 rows from which 1/64 of them have "rank" value of "1".

I've just started working with python and pandas so everything here is quite new to me. I was just wondering if there would be more efficient way to iterate trough the data. I have tried googling and searching from the stackoverflow, but i haven't had any success finding anything usable.

import csv
import pandas as pd

allDF = pd.read_csv ('players-with-rank-data.csv')
winnerDF = allDF[allDF['rank'] == 1]

def WinPercentageCalculator(allDF, winnerDF):
  radius = 20000

  for i in winnerDF.index:
    center_x = allDF.at[i,'x']
    center_y = allDF.at[i,'y']
    jumperCountInsideRadius = 0
    winnerCountInsideRadius = 0

    for row in allDF.index:
      x = allDF.at[row,'x']
      y = allDF.at[row,'y']
      dx = abs(x - center_x)
      dy = abs(-y - -center_y)

      if (dx*dx)+(dy*dy) <= (radius*radius): 
        jumperCountInsideRadius += 1
        rank = allDF.at[row,'rank']

        if rank == 1:
          winnerCountInsideRadius += 1


    winPercentage = (float(winnerCountInsideRadius) / float(jumperCountInsideRadius))
    print(str(winPercentage))


WinPercentageCalculator(allDF, winnerDF)

Naga kiran · Accepted Answer

It would reduce almost half of your time complexity @Ode,

import csv
import pandas as pd

allDF = pd.read_csv ('players-with-rank-data.csv')
winnerDF = allDF[allDF['rank'] == 1]

radius = 20000

for i in winnerDF.index:
    center_x = allDF.at[i,'x']
    center_y = allDF.at[i,'y']

    #Check and resolve the minor changes w.r.t data, as data not visible its blind folded code
    jumperCountInsideRadius = ( allDF['x'].subtract(center_x).pow(2) + 
                                allDF['y'].subtract(center_y).pow(2)  ).lt(radius**2).sum()
    winnerCountInsideRadius = jumperCountInsideRadius.eq(1).sum()


    winPercentage = (float(winnerCountInsideRadius) / float(jumperCountInsideRadius))
    print(str(winPercentage))

Optimizing looping and calculating through pandas dataframe that has mutiple if-statements

Answers (1)

Related Questions