Fowler Fox
Fowler Fox

Reputation: 131

How to retrieve a sequence of records from a Pandas dataframe that contains a certain value at the end of one of the columns?

I'm using event based data, looking to only return rows that are in a certain sequence. Below is an example of a sequence and the structure of the data.

data

My issue is with returning a sequence like this, the idea is to search through the action_name column to find a goal entry and then return every pass/action that occurred leading up to the goal. The way to link the actions is by the possesion_number and match_id as these will be unique for each sequence.

I can complete the first part by returning all the pass and goal actions into a separate dataframe with below code but how do I retain sequences that only end with a goal?

pass_goal = data.loc[data.action_name.isin(['Attacking pass accurate',
                         'Non attacking pass accurate','Accurate key pass','goal'])]

Upvotes: 1

Views: 366

Answers (2)

Dave
Dave

Reputation: 2049

I think there's a better way to approach the problem. First, aggregate up to the level of the possession, keeping the actions and some way to sequence them (second?)

posses = data.groupby('possession_number')[['second', 'action_name']].apply(lambda action: action.values.tolist()).reset_index(name='action_seq')

Then order the actions within and identify ones where the last action is a goal.

posses['ends_in_goal'] = posses.apply(lambda pos: sorted(pos['action_seq'])[-1][1] == 'goal', axis=1)

EDIT:

To clarify, the apply method evaluates a function on (in this case, with axis=1) rows and assigns the output to a new column. Here's the function with a breakout:

lambda pos:                    # Define a lambda with the row as "pos"
    sorted(pos['action_seq'])  # Sort the "action_seq" list of tuples by the first tuple value, "second" (see above)
    [-1][1]                    # Get the last tuple in the sorted list of tuples, and from the last tuple, get the second value, "action_name"
  == 'goal'                    # Return True if the "action_name" of the last tuple in the sorted list of actions for that possession is "goal" 

If you want to sort the actions in the possession by a different column, create the list of tuples with that column as the first tuple value (see ['second', 'action_name'] from above.) Then get the value for action_name from the appropriate place in the tuple.

Upvotes: 1

BStadlbauer
BStadlbauer

Reputation: 1285

Assuming that all your data is in df and that every goal has another pair of ['possion_code', 'match_id'], you could to something like the following:

goals = df.groupby(['posession_number', 'match_id']). \
          filter(lambda grouped_df: (grouped_df['action_name'] == 'goal').any())

Upvotes: 0

Related Questions