Reputation: 89
I have a data set as such and I need to group them sequentially, and join the dates.
number, date
123456,2021-01-16
123456,2021-01-18
98765,2021-01-19
98765,2021-02-01
123456,2021-02-02
123456,2021-02-03
123456,2021-02-09
123456,2021-02-11
7645323,2021-02-13
7645323,2021-02-16
7645323,2021-02-17
I want the output to be like,
number,date
123456, 2021-01-16 to 2021-01-18
98765, 2021-01-19 to 2021-02-01
123456, 2021-02-02 to 2021-02-11
7645323, 2021-02-13 to 2021-02-17
This is my code, referred from this answer is very close, but can't work on dates, and I'm not able to figure out
df.groupby(df.number!=df.number.shift())['date'].transform(lambda x: ','.join(x)).reset_index(drop=True)
I do get an output, but somewhat like this, not having number at all.
NOTE:
Link to test data and output data
Upvotes: 1
Views: 48
Reputation: 323366
You can do with shift
and cumsum
create the groupby
key
key = df.number.ne(df.number.shift()).cumsum()
out = df.groupby(key).agg({'number':'first',
'date' : lambda x : x.iloc[0] + ' to ' + x.iloc[-1]})
Out[822]:
number date
number
1 123456 2021-01-16 to 2021-01-18
2 98765 2021-01-19 to 2021-02-01
3 123456 2021-02-02 to 2021-02-11
4 7645323 2021-02-13 to 2021-02-17
Upvotes: 4