Reputation: 185
I have an Input Dataframe that the following :
NAME TEXT START END
Tim Tim Wagner is a teacher. 10 20.5
Tim He is from Cleveland, Ohio. 20.5 40
Frank Frank is a musician. 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. 62 70
Frank He performed at the Carnegie Hall last year. 70 85
Frank It was fantastic listening to him. 85 90
Want output dataframe as follows:
NAME TEXT START END
Tim Tim Wagner is a teacher. He is from Cleveland, Ohio. 10 40
Frank Frank is a musician 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him. 62 90
Appreciate your help on this.
Thanks
Upvotes: 0
Views: 41
Reputation: 153460
Try:
grp = (df['NAME'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT','START','END']\
.agg({'TEXT':lambda x: ' '.join(x), 'START': 'min', 'END':'max'})\
.reset_index().drop('group', axis=1)
Output:
NAME TEXT START END
0 Tim Tim Wagner is a teacher. He is from Cleveland,... 10.0 40.0
1 Frank Frank is a musician. 40.0 50.0
2 Tim He like to travel with his family 50.0 62.0
3 Frank He is a performing artist who plays the cello.... 62.0 90.0
Upvotes: 1