Kirti Purohit
Kirti Purohit

Reputation: 89

Group rows sequentially in python

I have a data set as such and I need to group them sequentially, and join the dates.

number, date 

123456,2021-01-16
123456,2021-01-18
98765,2021-01-19 
98765,2021-02-01
123456,2021-02-02
123456,2021-02-03
123456,2021-02-09
123456,2021-02-11
7645323,2021-02-13
7645323,2021-02-16
7645323,2021-02-17

I want the output to be like,

number,date 

123456, 2021-01-16 to 2021-01-18
98765, 2021-01-19 to 2021-02-01
123456, 2021-02-02 to 2021-02-11
7645323, 2021-02-13 to 2021-02-17

This is my code, referred from this answer is very close, but can't work on dates, and I'm not able to figure out

df.groupby(df.number!=df.number.shift())['date'].transform(lambda x: ','.join(x)).reset_index(drop=True)

I do get an output, but somewhat like this, not having number at all.

enter image description here

NOTE:

Link to test data and output data

Upvotes: 1

Views: 48

Answers (1)

BENY
BENY

Reputation: 323366

You can do with shift and cumsum create the groupby key

key = df.number.ne(df.number.shift()).cumsum()
out = df.groupby(key).agg({'number':'first',
                           'date' : lambda x : x.iloc[0] + ' to ' +  x.iloc[-1]})
Out[822]: 
         number                       date
number                                    
1        123456   2021-01-16 to 2021-01-18
2         98765  2021-01-19  to 2021-02-01
3        123456   2021-02-02 to 2021-02-11
4       7645323   2021-02-13 to 2021-02-17

Upvotes: 4

Related Questions