Reputation: 10033

Pandas - populate new column based on existing column values

I have the following dataframe df_shots:

              TableIndex  MatchID  GameWeek           Player  ...      ShotPosition    ShotSide      Close             Position
ShotsDetailID                                                 ...                                                              
6                      5    46605         1  Roberto Firmino  ...  very close range         N/A      close  very close rangeN/A
8                      7    46605         1  Roberto Firmino  ...           the box  the centre  not close    the boxthe centre
10                     9    46605         1  Roberto Firmino  ...           the box    the left  not close      the boxthe left
17                    16    46605         1  Roberto Firmino  ...           the box  the centre      close    the boxthe centre
447                  446    46623         2  Roberto Firmino  ...           the box  the centre      close    the boxthe centre
...                  ...      ...       ...              ...  ...               ...         ...        ...                  ...
6656                6662    46870        27  Roberto Firmino  ...  very close range         N/A      close  very close rangeN/A
6666                6672    46870        27  Roberto Firmino  ...           the box   the right  not close     the boxthe right
6674                6680    46870        27  Roberto Firmino  ...           the box  the centre  not close    the boxthe centre
6676                6682    46870        27  Roberto Firmino  ...           the box    the left  not close      the boxthe left
6679                6685    46870        27  Roberto Firmino  ...   outside the box         N/A  not close   outside the boxN/A

For the sake of clarity, all possible 'Position' values are:

positions = ['a difficult anglethe left',
             'a difficult anglethe right',
             'long rangeN/A',
             'long rangethe centre',
             'long rangethe left',
             'long rangethe right',
             'outside the boxN/A',
             'penaltyN/A',
             'the boxthe centre',
             'the boxthe left',
             'the boxthe right',
             'the six yard boxthe left',
             'the six yard boxthe right',
             'very close rangeN/A']

Now I would to map the following x/y values to each 'Position' name, storing the value under a new 'Position XY' column:

    the_boxthe_center = {'y':random.randrange(25,45), 'x':random.randrange(0,6)}
    the_boxthe_left = {'y':random.randrange(41,54), 'x':random.randrange(0,16)}
    the_boxthe_right = {'y':random.randrange(14,22), 'x':random.randrange(0,16)}
    very_close_rangeNA = {'y':random.randrange(25,43), 'x':random.randrange(0,4)}
    six_yard_boxthe_left = {'y':random.randrange(33,43), 'x':random.randrange(4,6)}
    six_yard_boxthe_right = {'y':random.randrange(25,33), 'x':random.randrange(4,6)}
    a_diffcult_anglethe_left = {'y':random.randrange(43,54), 'x':random.randrange(0,6)}
    a_diffcult_anglethe_right = {'y':random.randrange(14,25), 'x':random.randrange(0,6)}
    penaltyNA = {'y':random.randrange(36), 'x':random.randrange(8)}
    outside_the_boxNA = {'y':random.randrange(14,54), 'x':random.randrange(16,28)}
    long_rangeNA = {'y':random.randrange(0,68), 'x':random.randrange(40,52)}
    long_rangethe_centre = {'y':random.randrange(0,68), 'x':random.randrange(28,40)}
    long_rangethe_right = {'y':random.randrange(0,14), 'x':random.randrange(0,24)}
    long_rangethe_left = {'y':random.randrange(54,68), 'x':random.randrange(0,24)}

I tried:

if df_shots['Position']=='very close rangeN/A':
        df_shots['Position X/Y']==very_close_rangeNA
...# and so on

But I get:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How do I do this?

Upvotes: 1

Answers (3)

T. Novais

Reputation: 79

Here is a bit of code that might do the trick you want.

first create a list of all your "Position XY" like

position_xy = [the_boxthe_center,the_boxthe_left,....,long_rangethe_left] #and so on...

and the correspondent positions list (as you already have) then I propose you to do a dictionary so that every position does a correspondent position xy calculation

dict_positionxy = dict(zip(position, position_xy))

then you create a new column in your dataframe , where you want to store the x,y values based on the position

 df_shots['Position X/Y'] = 0.

now you loop through all rows one by one

for index, row in df_shots.iterrows():
    for key, values in dict_positionxy.items():

       if row['Position'] == key:
           #row['Position X/Y'] = value
           df_shots.at[index,’Position X/Y’]= value

print(df_shots)

This should do the trick :)

Upvotes: 0

Umar.H

Reputation: 23099

It's bad form to store so many related variables outside of a container, lets use a dictionary that we map to your dataframe.

data_dict = 
{'the boxthe centre': {'y':random.randrange(25,45)...}


df['Position'] = df['Position'].map(data_dict)

print(df['Position'])
6        {'y': 35, 'x': 2}
8        {'y': 32, 'x': 1}
10      {'y': 44, 'x': 11}
17       {'y': 32, 'x': 1}
447      {'y': 32, 'x': 1}
...                    NaN
6656     {'y': 35, 'x': 2}
6666    {'y': 15, 'x': 11}
6674     {'y': 32, 'x': 1}
6676    {'y': 44, 'x': 11}
6679    {'y': 37, 'x': 16}
Name: Position, dtype: object

Upvotes: 1

ncasale

Reputation: 879

Here's some sample code that accomplishes what you want. I created a basic mockup of df_shots, but this should run the same on your larger DataFrame. I've also stored some of those free variables in a dict to make filtering simpler.

It should be noted, that because you pre-compute the random values of positions_xy, all x/y values will be the same for each shot position. This may or may not be what you intended.

import pandas as pd
import random

# Sample df_shots
df_shots = pd.DataFrame({'Position': ['the_boxthe_center', 'the_boxthe_left']})

# Store position/xy pairs in dict
positions_xy = {'the_boxthe_center': {'y': random.randrange(25, 45), 'x': random.randrange(0, 6)},
                'the_boxthe_left': {'y': random.randrange(41, 54), 'x': random.randrange(0, 16)}}

# Create new column
df_shots['Position XY'] = ''

# Iterate over all position/xy pairs
for position, xy in positions_xy.items():
    # Determine indices of all players that match
    matches = df_shots['Position'] == position
    matches_indices = matches[matches].index
    # Update matching rows in df_shots with xy
    for idx in matches_indices:
        df_shots.at[idx, 'Position XY'] = xy

print(df_shots)

Outputs:

            Position        Position XY
0  the_boxthe_center  {'y': 36, 'x': 2}
1    the_boxthe_left  {'y': 44, 'x': 0}

Upvotes: 0

Pandas - populate new column based on existing column values

Answers (3)

Related Questions