Luke
Luke

Reputation: 7089

Fastest way to add an extra row to a groupby in pandas

I'm trying to create a new row for each group in a dataframe by copying the last row and then modifying some values. My approach is as follows, the concat step appears to be the bottleneck (I tried append too). Any suggestions?

def genNewObs(df):
    lastRowIndex = df.obsNumber.idxmax()
    row = pd.DataFrame(df.ix[lastRowIndex].copy())
    # changes some other values in row here
    df = pd.concat([df,row], ignore_index=True)
    return df

df = df.groupby(GROUP).apply(genNewObs)

Edit 1: Basically I have a bunch of data with the last observation on different dates. I want to create a final observation for all groups on the current date.

Group   Date       Days Since last Observation
  A    1/1/2014               0
  A    1/10/2014              9
  B    1/5/2014               0
  B    1/25/2014              20
  B    1/27/2014              2

If we pretend the current date is 1/31/2014 this becomes:

Group   Date       Days Since last Observation
  A    1/1/2014               0
  A    1/10/2014              9
  A    1/31/2014              21
  B    1/5/2014               0
  B    1/25/2014              20
  B    1/27/2014              2
  B    1/31/2014              4

I've tried setting with enlargement and it is the slowest of all techniques. Any ideas?

Upvotes: 2

Views: 2897

Answers (1)

Luke
Luke

Reputation: 7089

Thanks to user1827356, I sped it up by a factor of 100 by taking the operation out of the apply. For some reason first was dropping by Group column, so I used idxmax instead.

def genNewObs(df):
    lastRowIndex = df.groupby(Group).Date.idxmax()
    rows = df.ix[lastRowIndex]
    df = pd.concat([df,rows], ignore_index=True)
    df = df.sort([Group, Date], ascending=True)
    return df

Upvotes: 1

Related Questions