Reputation: 43
I'm am testing a ML model and need to merge my text to cut my audio file and train the model. How can I merge the text using conditions ?
My goal is to merge the text in the 'Text' column until I reach an end punctuation to form a sentence. I want to continue to form sentences until I reach the end of the text file.
I have tried to use pandas groupby.
df.groupby(['Name','Speaker','StTime','EnTime'])['Text'].apply(' '.join).reset_index()
Example:
Name Speaker StTime Text EnTime
s1 tom 6.8 I would say 7.3
s1 tom 7.3 7.6
s1 tom 7.6 leap frog 8.3
s1 tom 8.3 9.2
s1 tom 9.2 a pig. 10.1
Name Speaker StTime Text EnTime
s1 tom 6.8 I would say leap frog a pig. 10.1
Upvotes: 1
Views: 656
Reputation: 863196
Use GroupBy.agg
with added functions GroupBy.first
and GroupBy.last
and for column Text
is use custom lambda function with filter out empty string:
df1 = (df.groupby(['Name','Speaker'], sort=False)
.agg({'StTime':'first',
'Text': lambda x: ' '.join(y for y in x if y != ''),
'EnTime':'last'})
.reset_index())
print (df1)
Name Speaker StTime Text EnTime
0 s1 tom 6.8 I would say leap frog a pig. 10.1
Upvotes: 0
Reputation: 71610
Or use:
>>> df['Text'] = df.groupby(['Name', 'Speaker'])['Text'].transform(' '.join).str.split().str.join(' ')
>>> df2 = df.head(1)
>>> df2['EnTime'] = df['EnTime'].iloc[-1]
>>> df2
Name Speaker StTime Text EnTime
0 s1 tom 6.8 I would say leap frog a pig. 10.1
>>>
Upvotes: 1