Walt Reed
Walt Reed

Reputation: 1466

Pandas - Identify Last Row by Date

I'm trying to accomplish two things in my Pandas dataframe:

  1. Create new column Last Row ('Yes' or 'No') based on new DateCompleted
  2. Capture the next transaction on the current row, unless it's a new DateCompleted (in which case mark as Null).

Original Dataset

        DateCompleted      TranNumber  Sales

    0   1/1/17 10:15AM     3133         130.31
    1   1/1/17 11:21AM     3531         103.12  
    2   1/1/17 12:31PM     3652         99.23  
    3   1/2/17 9:31AM      3689         83.22
    4   1/2/17 10:31AM     3701         29.93
    5   1/3/17 8:30AM      3709         31.31 

Desired Output

        DateCompleted      TranNumber   Sales    NextTranSales  LastRow

    0   1/1/17 10:15AM     3133         130.31   103.12         No
    1   1/1/17 11:21AM     3531         103.12   99.23          No
    2   1/1/17 12:31PM     3652         99.23    NaN            Yes
    3   1/2/17 9:31AM      3689         83.22    29.93          No 
    4   1/2/17 10:31AM     3701         29.93    NaN            Yes
    5   1/3/17 8:30AM      3709         31.31    ...            No

I can get the NextTranSales based on:

 df['NextTranSales'] = df.Sales.shift(-1)

But I'm having trouble determining the last row in the DateCompleted group and marking NextTranSales as Null if it is the last row.

Thanks for your help!

Upvotes: 5

Views: 3782

Answers (2)

piRSquared
piRSquared

Reputation: 294348

NOTE: This depends on Sales being free of NaN. If it has any NaN we will get erroneous determinations of last row. This happens because I'm leveraging the convenience that the shifted column leaves a NaN in the last position.

d = df.DateCompleted.dt.date
m = {True: 'Yes', False: 'No'}
s = df.groupby(d).Sales.shift(-1)
df = df.assign(NextTranSales=s).assign(LastRow=s.isnull().map(m))
print(df)

        DateCompleted  TranNumber   Sales  NextTranSales LastRow
0 2017-01-01 10:15:00        3133  130.31         103.12      No
1 2017-01-01 11:21:00        3531  103.12          99.23      No
2 2017-01-01 12:31:00        3652   99.23            NaN     Yes
3 2017-01-02 09:31:00        3689   83.22          29.93      No
4 2017-01-02 10:31:00        3701   29.93            NaN     Yes
5 2017-01-03 08:30:00        3709   31.31            NaN     Yes

We can be free of the no NaN restriction with this

d = df.DateCompleted.dt.date
m = {True: 'Yes', False: 'No'}
s = df.groupby(d).Sales.shift(-1)
l = pd.Series(
    'Yes', df.groupby(d).tail(1).index
).reindex(df.index, fill_value='No')
df.assign(NextTranSales=s).assign(LastRow=l)

        DateCompleted  TranNumber   Sales  NextTranSales LastRow
0 2017-01-01 10:15:00        3133  130.31         103.12      No
1 2017-01-01 11:21:00        3531  103.12          99.23      No
2 2017-01-01 12:31:00        3652   99.23            NaN     Yes
3 2017-01-02 09:31:00        3689   83.22          29.93      No
4 2017-01-02 10:31:00        3701   29.93            NaN     Yes
5 2017-01-03 08:30:00        3709   31.31            NaN     Yes

Upvotes: 2

akuiper
akuiper

Reputation: 214987

If your data frame has been sorted by the DateCompleted column, then you might just need groupby.shift:

date = pd.to_datetime(df.DateCompleted).dt.date    
df["NextTranSales"] = df.groupby(date).Sales.shift(-1)

enter image description here

If you need the LastRow column, you can find out the last row index with groupby and then assign yes to the rows:

last_row_index = df.groupby(date, as_index=False).apply(lambda g: g.index[-1])
df["LastRow"] = "No"
df.loc[last_row_index, "LastRow"] = "Yes"
df

enter image description here

Upvotes: 5

Related Questions