Reputation: 37
So i have a csv-file with three columns x
,y
and rank
like this:
x,y,rank
153271,30622,12
143400,125936,6
153718,31606,1
19680,168350,2
99291,47206,19
39208,112928,1
97469,169684,17
...
What I want to do is to iterate through this csv file and find all the x
and y
coordinates that have rank
value of 1. Then "draw" a circle around every coordinate that has rank of 1 and calculate how many values fall inside this circle. Then I want to calculate all the values that have rank
value of "1" inside this circle. This way I can calculate a percentage of how many "rank" "1" values are inside these circles compared to other "rank" values.
I have managed to exactly this with the code below but it's super slow. It takes about 3 second to print the WinPercentage for each row with "rank" "1". The dataset is quite large however. It has 320 000 rows from which 1/64 of them have "rank" value of "1".
I've just started working with python and pandas so everything here is quite new to me. I was just wondering if there would be more efficient way to iterate trough the data. I have tried googling and searching from the stackoverflow, but i haven't had any success finding anything usable.
import csv
import pandas as pd
allDF = pd.read_csv ('players-with-rank-data.csv')
winnerDF = allDF[allDF['rank'] == 1]
def WinPercentageCalculator(allDF, winnerDF):
radius = 20000
for i in winnerDF.index:
center_x = allDF.at[i,'x']
center_y = allDF.at[i,'y']
jumperCountInsideRadius = 0
winnerCountInsideRadius = 0
for row in allDF.index:
x = allDF.at[row,'x']
y = allDF.at[row,'y']
dx = abs(x - center_x)
dy = abs(-y - -center_y)
if (dx*dx)+(dy*dy) <= (radius*radius):
jumperCountInsideRadius += 1
rank = allDF.at[row,'rank']
if rank == 1:
winnerCountInsideRadius += 1
winPercentage = (float(winnerCountInsideRadius) / float(jumperCountInsideRadius))
print(str(winPercentage))
WinPercentageCalculator(allDF, winnerDF)
Upvotes: 2
Views: 65
Reputation: 4607
It would reduce almost half of your time complexity @Ode,
import csv
import pandas as pd
allDF = pd.read_csv ('players-with-rank-data.csv')
winnerDF = allDF[allDF['rank'] == 1]
radius = 20000
for i in winnerDF.index:
center_x = allDF.at[i,'x']
center_y = allDF.at[i,'y']
#Check and resolve the minor changes w.r.t data, as data not visible its blind folded code
jumperCountInsideRadius = ( allDF['x'].subtract(center_x).pow(2) +
allDF['y'].subtract(center_y).pow(2) ).lt(radius**2).sum()
winnerCountInsideRadius = jumperCountInsideRadius.eq(1).sum()
winPercentage = (float(winnerCountInsideRadius) / float(jumperCountInsideRadius))
print(str(winPercentage))
Upvotes: 2