Umar.H
Umar.H

Reputation: 23099

count latest occurrence of a status - Pandas

I have a sort of inelegant solution for the following problem and I was wondering if there was a better way to handle the following sort of dataset,

I have a small data set of around 150k rows which has 3 essential rows I'm trying to wrangle :

ID  Status Week
1   base   1
1   over   2
1   base   3
1   over   4
1   over   5
1   under  6

What would be the most efficient way of counting the final occurrences of over ?

essentially, for the data-set above I would say ID == 1 was last at the Status over starting at week 4 for a total of two weeks, 4 and 5.

My method at the moment is to sort my df by ID and Week, then groupby.tail() to get the last occurrences of the over and under and sum the differences of the weeks.

I feel a bit silly as I'm sure this is a simple problem..!

target dataset

ID Duration of Last-Over, Start-Week, End-Week  
1, 2,                     4,           5  

original DataFrame :

ID, Budget, Spend, Week, Status
1,  50,     50,    1,    base
1   50,     55,    2,    over
1   50,     50,    3,    base
1   50,     250,   4,    over
1   50,     300,   5,    over
1   50,     42,    6,    under. 

Upvotes: 1

Views: 52

Answers (1)

Daniel Labbe
Daniel Labbe

Reputation: 2019

This code below does that in two steps:

last = df[df.Status == 'over'][::-1][0:1].index
before_last = df[df.Status == 'over'][::-1][1:2].index
diff = df.iloc[last,3].values - df.iloc[before_last,3].values +1
ID = df.iloc[last,0]

new_df = pd.DataFrame({'ID': {0: ID.values[0]}, 
                       'Duration of Last-Over':  {0: diff[0]}, 
                       'Start-Week':  {0: df.iloc[before_last]['Week'].values[0]}, 
                       'End-Week':  {0: df.iloc[last]['Week'].values[0]}})

Producing this output:

enter image description here

EDIT: maybe a more simple answer:

Getting just the last two occurrences of 'over' and manipulate it.

df_aux = df[df.Status == 'over'].tail(2)
new_df = pd.DataFrame({'ID': {0: df_aux.iloc[1,0]}, 
                       'Duration of Last-Over':  {0: df_aux.iloc[1,3] - df_aux.iloc[0,3] +1}, 
                       'Start-Week':  {0: df_aux.iloc[0,3]}, 
                       'End-Week':  {0: df_aux.iloc[1,3]}})

Upvotes: 1

Related Questions