Reputation: 43
I have a dataframe similar to the following:
| Session | Leg | Talk Time Seconds | ... |
| ------- | --- | ----------------- | --- |
| 1 | 1 | 0 | ... |
| 1 | 2 | 40 | ... |
| 2 | 1 | 35 | ... |
| 2 | 2 | 50 | ... |
| ... | ... | ... | ... |
If it is not the last leg in a particular session and Talk Time Seconds > 0 then the leg should be marked as having been transferred, as so:
| Session | Leg | Talk Time Seconds | Transfer | ... |
| ------- | --- | ----------------- | -------- | --- |
| 1 | 1 | 0 | False | ... |
| 1 | 2 | 40 | False | ... |
| 2 | 1 | 35 | True | ... |
| 2 | 2 | 50 | False | ... |
| ... | ... | ... | ... | ... |
I'm currently accomplishing this by breaking down the dataframe into a list of dataframes for each session. Using an apply and then concatenating the dataframes back together. This is taking a very long time to run and I am wondering if there is a better way to do this. My code is below:
# get list of unique sessions
sessions = df['Session'].unique().tolist()
# list of dataframes for each session
session_dfs = [df[df['Session'] == session] for session in sessions]
# if it is not the last leg of the session and talk time is greater than 1 we can assume it is a transfer
for sdf in session_dfs:
last_leg = sdf['Leg'].max()
sdf['Transfer'] = sdf.apply(lambda row: row['Talk Time Seconds'] > 0 if row['Leg'] != last_leg else False, axis=1)
# join all the session dataframes into one dataframe
df = pd.concat(session_dfs)
EDIT: When attempting the following suggestion:
df['Transfer'] = (
df
.groupby("Session", as_index=True)
.apply(lambda x: (x['Leg'] != x['Leg'].max()) & (x['Talk Time Seconds'].gt(0)))
.reset_index(drop=True)
)
I get this output:
>>> df[(df['Transfer']==1) & (df['Talk Time Seconds'] == 0)][['Session','Leg','Talk Time Seconds','Transfer']]
Session Leg Talk Time Seconds Transfer
11060 51000011288 1 0 True
16167 51000011528 1 0 True
15176 51000012182 1 0 True
8468 51000012228 1 0 True
2175 51000012585 2 0 True
It looks like it is occurring on instances where talk time is 0 but it is the last leg of the call:
>>> df[df['Session'] == 51000012585][['Session','Leg','Talk Time Seconds','Transfer']]
Session Leg Talk Time Seconds Transfer
2175 51000012585 2 0 True
3452 51000012585 1 0 False
Upvotes: 1
Views: 681
Reputation: 3001
Here is one way.
First, create the data frame:
d = {'Session': {0: 1, 1: 1, 2: 2, 3: 2},
'Leg': {0: 1, 1: 2, 2: 1, 3: 2},
'TalkTimeSeconds': {0: 0, 1: 40, 2: 35, 3: 50}}
df = pd.DataFrame.from_dict(d)
Second, find the max leg number for each session; .groupby().transform()
returns a series that's the same length as the original series.
df['max_leg'] = df.groupby('Session')['Leg'].transform('max')
Third, apply the logic to identify transfers (before last leg, time > 0):
df['Transfer'] = (df['Leg'] < df['max_leg']) & (df['TalkTimeSeconds'] > 0)
The result is shown below (I left max_leg) in, to show the result of the groupby/transform statement):
Session Leg TalkTimeSeconds max_leg Transfer
0 1 1 0 2 False
1 1 2 40 2 False
2 2 1 35 2 True
3 2 2 50 2 False
Upvotes: 2
Reputation: 1227
I suggest to use a mask to recognize if this is the last Leg
of the current Session
and one other to match the condition on Talk Time Seconds
.
For the first part you can group by
to take the maximum Leg
of each Session
and use pd.Series.map
to check the criteria. For the second past a simple filter is enough.
The following code should satisfy your requirements
df["Transfer"] = (df["Leg"] != df["Session"].map(df.groupby("Session")["Leg"].max())) & (df["Talk Time Seconds"] > 0)
Upvotes: 1
Reputation: 21719
You can do:
df['Transfer'] = (df
.groupby("Session", as_index=True)
.apply(lambda x: (x['Leg'] != x['Leg'].max()) & (x['Talk Time Seconds'].gt(0)))
.reset_index(drop=True))
print(df)
Session Leg Talk Time Seconds Transfer
0 1 1 0 False
1 1 2 40 False
2 2 1 35 True
3 2 2 50 False
Upvotes: 0