Moses Wynn
Moses Wynn

Reputation: 43

How can I reference other rows in a Pandas dataframe apply

I have a dataframe similar to the following:

| Session | Leg | Talk Time Seconds | ... |
| ------- | --- | ----------------- | --- |
| 1       | 1   | 0                 | ... |
| 1       | 2   | 40                | ... |
| 2       | 1   | 35                | ... |
| 2       | 2   | 50                | ... |
| ...     | ... | ...               | ... |

If it is not the last leg in a particular session and Talk Time Seconds > 0 then the leg should be marked as having been transferred, as so:

| Session | Leg | Talk Time Seconds | Transfer | ... |
| ------- | --- | ----------------- | -------- | --- |
| 1       | 1   | 0                 | False    | ... |
| 1       | 2   | 40                | False    | ... |
| 2       | 1   | 35                | True     | ... |
| 2       | 2   | 50                | False    | ... |
| ...     | ... | ...               | ...      | ... |

I'm currently accomplishing this by breaking down the dataframe into a list of dataframes for each session. Using an apply and then concatenating the dataframes back together. This is taking a very long time to run and I am wondering if there is a better way to do this. My code is below:

# get list of unique sessions
sessions = df['Session'].unique().tolist()

# list of dataframes for each session
session_dfs = [df[df['Session'] == session] for session in sessions]

# if it is not the last leg of the session and talk time is greater than 1 we can assume it is a transfer
for sdf in session_dfs:
    last_leg = sdf['Leg'].max()
    sdf['Transfer'] = sdf.apply(lambda row: row['Talk Time Seconds'] > 0 if row['Leg'] != last_leg else False, axis=1)

# join all the session dataframes into one dataframe
df = pd.concat(session_dfs)

EDIT: When attempting the following suggestion:

df['Transfer'] = (
    df
      .groupby("Session", as_index=True)
      .apply(lambda x: (x['Leg'] != x['Leg'].max()) & (x['Talk Time Seconds'].gt(0)))
      .reset_index(drop=True)
)

I get this output:

>>> df[(df['Transfer']==1) & (df['Talk Time Seconds'] == 0)][['Session','Leg','Talk Time Seconds','Transfer']]

        Session         Leg   Talk Time Seconds    Transfer
11060   51000011288     1     0                    True
16167   51000011528     1     0                    True
15176   51000012182     1     0                    True
8468    51000012228     1     0                    True
2175    51000012585     2     0                    True

It looks like it is occurring on instances where talk time is 0 but it is the last leg of the call:

>>> df[df['Session'] == 51000012585][['Session','Leg','Talk Time Seconds','Transfer']]

        Session         Leg   Talk Time Seconds    Transfer
2175    51000012585     2     0                    True
3452    51000012585     1     0                    False

Upvotes: 1

Views: 681

Answers (3)

jsmart
jsmart

Reputation: 3001

Here is one way.

First, create the data frame:

d = {'Session': {0: 1, 1: 1, 2: 2, 3: 2},
 'Leg': {0: 1, 1: 2, 2: 1, 3: 2},
 'TalkTimeSeconds': {0: 0, 1: 40, 2: 35, 3: 50}}
df = pd.DataFrame.from_dict(d)

Second, find the max leg number for each session; .groupby().transform() returns a series that's the same length as the original series.

df['max_leg'] = df.groupby('Session')['Leg'].transform('max')

Third, apply the logic to identify transfers (before last leg, time > 0):

df['Transfer'] = (df['Leg'] < df['max_leg']) & (df['TalkTimeSeconds'] > 0)

The result is shown below (I left max_leg) in, to show the result of the groupby/transform statement):

   Session  Leg  TalkTimeSeconds  max_leg  Transfer
0        1    1                0        2     False
1        1    2               40        2     False
2        2    1               35        2      True
3        2    2               50        2     False

Upvotes: 2

Roberto Trani
Roberto Trani

Reputation: 1227

I suggest to use a mask to recognize if this is the last Leg of the current Session and one other to match the condition on Talk Time Seconds.

For the first part you can group by to take the maximum Leg of each Session and use pd.Series.map to check the criteria. For the second past a simple filter is enough.

The following code should satisfy your requirements

df["Transfer"] = (df["Leg"] != df["Session"].map(df.groupby("Session")["Leg"].max())) & (df["Talk Time Seconds"] > 0)

Upvotes: 1

YOLO
YOLO

Reputation: 21719

You can do:

df['Transfer'] = (df
                  .groupby("Session", as_index=True)
                  .apply(lambda x: (x['Leg'] != x['Leg'].max()) & (x['Talk Time Seconds'].gt(0)))
                  .reset_index(drop=True))

print(df)

   Session  Leg  Talk Time Seconds  Transfer
0        1    1                  0     False
1        1    2                 40     False
2        2    1                 35      True
3        2    2                 50     False

Upvotes: 0

Related Questions