jonboy
jonboy

Reputation: 368

Include groupby statement in function - Python

The following function counts the number of points within different segments of a circle. This function works as intended when exporting counts for a single point in time. However, when trying export this count at different points in time using a groupby call, it still combines all counts to a single output.

import pandas as pd
import numpy as np

df = pd.DataFrame({   
        'Time' : ['19:50:10.1','19:50:10.1','19:50:10.1','19:50:10.1','19:50:10.2','19:50:10.2','19:50:10.2','19:50:10.2'],             
        'id' : ['A','B','C','D','A','B','C','D'],                 
        'x' : [1,8,0,-5,1,-1,-6,0],
        'y' : [-5,2,-5,2,5,-5,-2,2],
        'X2' : [0,0,0,0,0,0,0,0],
        'Y2' : [0,0,0,0,0,0,0,0],   
        'Angle' : [0,0,0,0,0,0,0,0],                 
    })

def checkPoint(x, y, rotation_angle, refX, refY, radius = 10):

    section_angle_start = [(i + rotation_angle - 45) for i in [0, 90, 180, 270, 360]]

    Angle = np.arctan2(x-refX, y-refY) * 180 / np.pi
    Angle = Angle % 360

    # adjust range
    if Angle > section_angle_start[-1]:
        Angle -= 360
    elif Angle < section_angle_start[0]:
        Angle += 360

    for i in range(4):
        if section_angle_start[i] < Angle < section_angle_start[i+1]:
            break
    else:
         i = 0

    return i+1  

tmp = []
result = []

The following is my attempt to pass the checkPoint function to each unique group in Time.

for group in df.groupby('Time'):

    for i, row in df.iterrows():
    
        seg = checkPoint(row.x, row.y, row.Angle, row.X2, row.Y2)

        tmp.append(seg)
    
    result.append([tmp.count(i) for i in [1,2,3,4]])

df = pd.DataFrame(result, columns = ['1','2','3','4'])

Out:

   1  2  3  4
0  2  1  3  2
1  4  2  6  4

Intended Out:

   1  2  3  4
0  0  1  2  1
1  2  0  1  1

Upvotes: 0

Views: 68

Answers (1)

foglerit
foglerit

Reputation: 8269

Your inner loop is running through your entire DataFrame, and generating the double-counting you are observing.

As @Kenan suggested, you can limit the inner loop to the group:

for group in df.groupby('Time'):

    for i, row in group[1].iterrows():

        seg = checkPoint(row.x_live, row.y_live, row.Angle, row.BallX, row.BallY)

        tmp.append(seg)

    result.append([tmp.count(i) for i in [1,2,3,4]])

df_result = pd.DataFrame(result, columns = ['1','2','3','4'])
print(df_result)

Resulting in

   1  2  3  4
0  0  1  2  1
1  2  1  3  2

Or you can use a groupby-apply construct to avoid the explicit loop:

def result(g):
    tmp = []
    for i, row in g.iterrows():
        seg = checkPoint(row.x_live, row.y_live, row.Angle, row.BallX, row.BallY)
        tmp.append(seg)
    return pd.Series([tmp.count(i) for i in [1,2,3,4]], index=[1,2,3,4])

print(df.groupby('Time').apply(result))

Which gets you:

            1  2  3  4
Time                  
19:50:10.1  0  1  2  1
19:50:10.2  2  0  1  1

Upvotes: 2

Related Questions