Reputation: 7089
I'm trying to create a new row for each group in a dataframe by copying the last row and then modifying some values. My approach is as follows, the concat step appears to be the bottleneck (I tried append too). Any suggestions?
def genNewObs(df):
lastRowIndex = df.obsNumber.idxmax()
row = pd.DataFrame(df.ix[lastRowIndex].copy())
# changes some other values in row here
df = pd.concat([df,row], ignore_index=True)
return df
df = df.groupby(GROUP).apply(genNewObs)
Edit 1: Basically I have a bunch of data with the last observation on different dates. I want to create a final observation for all groups on the current date.
Group Date Days Since last Observation
A 1/1/2014 0
A 1/10/2014 9
B 1/5/2014 0
B 1/25/2014 20
B 1/27/2014 2
If we pretend the current date is 1/31/2014 this becomes:
Group Date Days Since last Observation
A 1/1/2014 0
A 1/10/2014 9
A 1/31/2014 21
B 1/5/2014 0
B 1/25/2014 20
B 1/27/2014 2
B 1/31/2014 4
I've tried setting with enlargement and it is the slowest of all techniques. Any ideas?
Upvotes: 2
Views: 2897
Reputation: 7089
Thanks to user1827356, I sped it up by a factor of 100 by taking the operation out of the apply. For some reason first was dropping by Group column, so I used idxmax instead.
def genNewObs(df):
lastRowIndex = df.groupby(Group).Date.idxmax()
rows = df.ix[lastRowIndex]
df = pd.concat([df,rows], ignore_index=True)
df = df.sort([Group, Date], ascending=True)
return df
Upvotes: 1