Reputation: 2863
Let's say I have a DataFrame like this:
date id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
I want to group this dataset by id
.
For each group, I want to add a new row to it, with the date being be 1 year from now. This row should only be added IF it is later than the last date in the group. The row's val should be the same as the last row in the group.
The final table should look like this:
date id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
3 2018-09-25 2 50 <-- new row
The current code is below. I can get a mask showing which groups need a row appended, but not sure what to do next.
>>> df = pd.DataFrame(data={'d': [datetime.date(2017, 1, 1), datetime.date(2019,1,1), datetime.date(2017,1,1)], 'id': [1,1,2], 'val': [10,20,50]})
>>> df = df.sort_values(by='d')
>>> future_date = (pd.datetime.now().date() + pd.DateOffset(years=1)).date()
>>> maxd = df.groupby('id')['d'].max()
>>> maxd < future_date
id
1 False
2 True
Name: d, dtype: bool
Upvotes: 5
Views: 9114
Reputation: 863741
You can use idxmax
withloc
for rows with max
date
:
future_date = pd.to_datetime('today') + pd.DateOffset(years=1)
maxd = df.loc[df.groupby('id')['d'].idxmax()]
maxd = maxd[maxd['d'] < future_date]
maxd['d'] = future_date
print (maxd)
d id val
2 2018-09-25 2 50
df = pd.concat([df, maxd]).sort_values(['id','d']).reset_index(drop=True)
print (df)
d id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
3 2018-09-25 2 50
Upvotes: 6
Reputation: 294546
A different way to look at it, use duplicated
to find last row per 'id'
t = df[~df.duplicated('id', 'last')]
df.append(
t.assign(
date=pd.to_datetime('today') + pd.DateOffset(years=1)
).pipe(lambda d: d[d.date > t.date]),
ignore_index=True).sort_values(['id', 'date'])
date id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
3 2018-09-24 2 50
Upvotes: 4
Reputation: 77027
Here's one way
In [3481]: def add_row(x):
...: next_year = pd.to_datetime('today') + pd.DateOffset(years=1)
...: if x['date'].max() < next_year:
...: last_row = x.iloc[-1]
...: last_row['date'] = next_year
...: return x.append(last_row)
...: return x
...:
In [3482]: df.groupby('id').apply(add_row).reset_index(drop=True)
Out[3482]:
date id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
3 2018-09-25 2 50
Upvotes: 10