Chet Meinzer
Chet Meinzer

Reputation: 1741

mark last record in group by date pandas python

I want to mark the last record in group ['ru','case','opdate'] based on 'lst_svc'

Here is my attempt (problematic because the duplicate is counted twice).

import pandas as pd
from datetime import datetime

# Create Dateframe
d = {'ru'     : pd.Series([1., 1., 1., 1., 3, 3]),
     'case'   : pd.Series([2., 2., 2., 2., 2, 2]),
     'opdate' : pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 2), datetime(2012, 5, 2),datetime(2012, 5, 2), datetime(2012, 5, 3),datetime(2012, 5, 3)]),
     'lst_svc': pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 3), datetime(2012, 5, 5),datetime(2012, 5, 5),datetime(2012, 6, 5),])}

df = pd.DataFrame(d)

# Mark last
df['lastMark'] = (df.groupby(['ru','case','opdate'])['lst_svc'].transform(max) == df['lst_svc']).astype(int)

The DataFrame looks like this:

   case    lst_svc     opdate   ru
0   2.0 2012-05-02 2012-05-02  1.0
1   2.0 2012-05-03 2012-05-02  1.0
2   2.0 2012-05-05 2012-05-02  1.0
3   2.0 2012-05-05 2012-05-02  1.0
4   2.0 2012-06-05 2012-05-03  3.0
5   2.0        NaT 2012-05-03  3.0

The (wrong) result of my code looks like this:

   case    lst_svc     opdate   ru  lastMark
0   2.0 2012-05-02 2012-05-02  1.0         0
1   2.0 2012-05-03 2012-05-02  1.0         0
2   2.0 2012-05-05 2012-05-02  1.0         1
3   2.0 2012-05-05 2012-05-02  1.0         1
4   2.0 2012-06-05 2012-05-03  3.0         1
5   2.0        NaT 2012-05-03  3.0         0

Upvotes: 3

Views: 1309

Answers (1)

HYRY
HYRY

Reputation: 97301

How about:

import pandas as pd
from datetime import datetime

# Create example DateFrame
d = {'ru'     : pd.Series([1., 1., 1., 1., 3, 3]),
     'case'   : pd.Series([2., 2., 2., 2., 2, 2]),
     'opdate' : pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 2), datetime(2012, 5, 2),datetime(2012, 5, 2), datetime(2012, 5, 3),datetime(2012, 5, 3)]),
     'lst_svc': pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 3), datetime(2012, 5, 5),datetime(2012, 5, 5),datetime(2012, 6, 5),])}

df = pd.DataFrame(d)

# Mark last      
def f(s):
    s2 = pd.Series(0, index=s.index)
    s2.iloc[-1] = 1
    return s2

df["lastMark"] = df.groupby(['ru','case','opdate'])['lst_svc'].apply(f)

The output looks like this:

   case    lst_svc     opdate   ru  lastMark
0   2.0 2012-05-02 2012-05-02  1.0         0
1   2.0 2012-05-03 2012-05-02  1.0         0
2   2.0 2012-05-05 2012-05-02  1.0         0
3   2.0 2012-05-05 2012-05-02  1.0         1
4   2.0 2012-06-05 2012-05-03  3.0         0
5   2.0        NaT 2012-05-03  3.0         1

Upvotes: 5

Related Questions