Reputation: 1466
I have a df that I'm trying to perform a groupby
and shift
on. However, the output isn't what I want.
I want to shift the "next" DueDate
to the previous dates. So if the current DueDate
is 1/1, and the next DueDate
is 6/30, then insert a new column where the NextDueDate
is 6/30 for all rows where DueDate==1/1
. Then when the current DueDate
is 6/30, then insert the next DueDate
for all rows where DueDate==6/30
.
Original df
ID Document Date DueDate
1 ABC 1/31 1/1
1 ABC 2/28 1/1
1 ABC 3/31 1/1
1 ABC 4/30 6/30
1 ABC 5/31 6/30
1 ABC 6/30 7/31
1 ABC 7/31 7/31
1 ABC 8/31 9/30
Desired output df
ID Document Date DueDate NextDueDate
1 ABC 1/31 1/1 6/30
1 ABC 2/28 1/1 6/30
1 ABC 3/31 1/1 6/30
1 ABC 4/30 6/30 7/31
1 ABC 5/31 6/30 7/31
1 ABC 6/30 7/31 9/30
1 ABC 7/31 7/31 9/30
1 ABC 8/31 9/30 10/31
I've many variations along the lines of df['NextDueDate'] = df.groupby(['ID','Document'])['DueDate'].shift(-1)
but it doesn't quite get me where I want.
Upvotes: 2
Views: 1393
Reputation: 402263
Define a function f
to perform replacement based on shifted dates -
def f(x):
i = x.drop_duplicates()
j = i.shift(-1).fillna('10/30')
return x.map(dict(zip(i, j)))
Now, call this function inside a groupby
+ apply
on ID
and Document
-
df['NextDueDate'] = df.groupby(['ID', 'Document']).DueDate.apply(f)
df
ID Document Date DueDate NextDueDate
0 1 ABC 1/31 1/1 6/30
1 1 ABC 2/28 1/1 6/30
2 1 ABC 3/31 1/1 6/30
3 1 ABC 4/30 6/30 7/31
4 1 ABC 5/31 6/30 7/31
5 1 ABC 6/30 7/31 9/30
6 1 ABC 7/31 7/31 9/30
7 1 ABC 8/31 9/30 10/30
Upvotes: 3
Reputation: 323226
IIUC
s=df.groupby('DueDate',as_index=False).size().to_frame('number').reset_index()
s.DueDate=s.DueDate.shift(-1).fillna('10/31')
s
Out[251]:
DueDate number
0 6/30 3
1 7/31 2
2 9/30 2
3 10/31 1
s.DueDate.repeat(s.number)
Out[252]:
0 6/30
0 6/30
0 6/30
1 7/31
1 7/31
2 9/30
2 9/30
3 10/31
Name: DueDate, dtype: object
df['Nextduedate']=s.DueDate.repeat(s.number).values
df
Out[254]:
ID Document Date DueDate Nextduedate
0 1 ABC 1/31 1/1 6/30
1 1 ABC 2/28 1/1 6/30
2 1 ABC 3/31 1/1 6/30
3 1 ABC 4/30 6/30 7/31
4 1 ABC 5/31 6/30 7/31
5 1 ABC 6/30 7/31 9/30
6 1 ABC 7/31 7/31 9/30
7 1 ABC 8/31 9/30 10/31
If you have multiple group :
l=[]
for _, df1 in df.groupby(["ID", "Document"]):
s = df1.groupby('DueDate', as_index=False).size().to_frame('number').reset_index()
s.DueDate = s.DueDate.shift(-1).fillna('10/31')
df1['Nextduedate'] = s.DueDate.repeat(s.number).values
l.append(df1)
New_df=pd.concat(l)
Upvotes: 2