Reputation: 508

Fill one column of last item in group with another column's value

I have a dataset of assignments per case. I'm trying to fill the last unassigned from each case with the case_closed date.

case_nb   attorney      cas_closed_date assigned    last_event
1         A             2015-07-02      2015-07-02  2015-07-02
2         B             2015-09-10      2015-09-10  2015-09-10
3         C             2016-03-24      2016-03-24  2016-03-24
4         D             2018-06-07      2013-10-21  2014-02-06
4         C             2018-06-07      2013-09-13  2013-09-13
4         F             2018-06-07      2018-03-31  2018-04-05

(For replication:

df = pd.DataFrame({'assigned': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10', 2: '2016-03-24', 3: '2013-10-21', 4: '2013-09-13', 5: '2018-03-31'},
                    'attorney': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'C', 5: 'F'},
                    'cas_closed_date': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10', 2: '2016-03-24', 3: '2018-06-07', 4: '2018-06-07', 5: '2018-06-07'},
                    'case_nb': {0: 1, 1: 2, 2: 3, 3: 4, 4: 4, 5: 4},
                    'last_event': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10', 2: '2016-03-24', 3: '2014-02-06', 4: '2013-09-13', 5: '2018-04-05'}}

)

My dataset includes the case number, attorney assigned, the case closed date, the date the attorney was assigned, and the last event the attorney appeared on. For the above example, I'd want the last line to be

4         F             2018-06-07      2018-03-31  2018-06-07

I've seen a few ways of filling in NAs based on data in the same column, such as this question. But these ways use transform, which I can't get to use multiple columns.

What I have so far works with apply, but not transform:

def fixdate(gp):
    last_unasgn = gp.iloc[-1]['last_event']
    if gp.iloc[-1]['cas_closed_date'] > last_unasgn:
        return gp.iloc[-1]['cas_closed_date']
    else:
       return last_unasgn

asmt.groupby('evt_file_number').apply(lambda x: fixdate(x))
> 4  2018-06-07

Is there a way to use transform? If not, what's the best way to use that aggregate data to fill in my original dataset?

Upvotes: 1

Answers (3)

rpanai

Reputation: 13437

In the case the same case_nb is never assigned on the same day to two different attorney you can try this solution.

import numpy as np
import pandas as pd

df = pd.DataFrame({'assigned': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10',
                                2: '2016-03-24', 3: '2013-10-21', 4: '2013-09-13',
                                5: '2018-03-31'},
                    'attorney': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'C', 5: 'F'},
                    'cas_closed_date': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10',
                                        2: '2016-03-24', 3: '2018-06-07', 4: '2018-06-07',
                                        5: '2018-06-07'},
                    'case_nb': {0: 1, 1: 2, 2: 3, 3: 4, 4: 4, 5: 4},
                    'last_event': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10',
                                   2: '2016-03-24', 3: '2014-02-06', 4: '2013-09-13',
                                   5: '2018-04-05'}})

out = df.groupby("case_nb")["assigned"].max().reset_index()
out["last"] = True
df = pd.merge(df, out, how="left", on= ["case_nb", "assigned"])
df["last_event"] = np.where(df["last"] == True, df["cas_closed_date"], df["last_event"])
del df["last"], out

Probably there is a more elegant solution but at least this is vectorial (see np.where) and don't use apply. pandas performaces.

EDIT In case you want to use transform there is this option which seems to me to be the fastest solution.

df["last_assigned"] = df.groupby("case_nb")["assigned"].transform("max")
df["last_event"] = np.where(df["assigned"]==df["last_assigned"], 
                            df["cas_closed_date"],
                            df["last_event"])
del df["last_assigned"]

Upvotes: 1

RCA

Reputation: 508

I ended up doing a combination of transform and just plain indexing.

df = df.sort_values(['case_nb', 'last_event'])
df['last_unassigned'] = df.groupby('case_nb')['last_event'].transform('last')
df.loc[(df['last_event'] == df['last_unassigned'])
        & (df['last_unassigned'] != df['cas_closed_date']), 'last_event'] = df['cas_closed_date']

Upvotes: 0

t0bi

Reputation: 76

The .transform operation passes all columns for each group as a one dimensional Series to the function, while the .apply operation passes all columns for each group as a DataFrame to the function. So, .transform works on one Series (or column) at a time and .apply works for an entire DataFrame (or all columns). Hope this helps.

Upvotes: 1

Fill one column of last item in group with another column&#39;s value

Answers (3)

Related Questions

Fill one column of last item in group with another column's value