Reputation: 131
I'm using event based data, looking to only return rows that are in a certain sequence. Below is an example of a sequence and the structure of the data.
My issue is with returning a sequence like this, the idea is to search through the action_name column to find a goal entry and then return every pass/action that occurred leading up to the goal. The way to link the actions is by the possesion_number and match_id as these will be unique for each sequence.
I can complete the first part by returning all the pass and goal actions into a separate dataframe with below code but how do I retain sequences that only end with a goal?
pass_goal = data.loc[data.action_name.isin(['Attacking pass accurate',
'Non attacking pass accurate','Accurate key pass','goal'])]
Upvotes: 1
Views: 366
Reputation: 2049
I think there's a better way to approach the problem. First, aggregate up to the level of the possession, keeping the actions and some way to sequence them (second
?)
posses = data.groupby('possession_number')[['second', 'action_name']].apply(lambda action: action.values.tolist()).reset_index(name='action_seq')
Then order the actions within and identify ones where the last action is a goal.
posses['ends_in_goal'] = posses.apply(lambda pos: sorted(pos['action_seq'])[-1][1] == 'goal', axis=1)
EDIT:
To clarify, the apply
method evaluates a function on (in this case, with axis=1
) rows and assigns the output to a new column.
Here's the function with a breakout:
lambda pos: # Define a lambda with the row as "pos"
sorted(pos['action_seq']) # Sort the "action_seq" list of tuples by the first tuple value, "second" (see above)
[-1][1] # Get the last tuple in the sorted list of tuples, and from the last tuple, get the second value, "action_name"
== 'goal' # Return True if the "action_name" of the last tuple in the sorted list of actions for that possession is "goal"
If you want to sort the actions in the possession by a different column, create the list of tuples with that column as the first tuple value (see ['second', 'action_name']
from above.) Then get the value for action_name
from the appropriate place in the tuple.
Upvotes: 1
Reputation: 1285
Assuming that all your data is in df
and that every goal has another pair of ['possion_code', 'match_id']
, you could to something like the following:
goals = df.groupby(['posession_number', 'match_id']). \
filter(lambda grouped_df: (grouped_df['action_name'] == 'goal').any())
Upvotes: 0