KD_35_CEO
KD_35_CEO

Reputation: 43

Combine text of a column in dataframe with conditions in pandas/python

I'm am testing a ML model and need to merge my text to cut my audio file and train the model. How can I merge the text using conditions ?

My goal is to merge the text in the 'Text' column until I reach an end punctuation to form a sentence. I want to continue to form sentences until I reach the end of the text file.

I have tried to use pandas groupby.

df.groupby(['Name','Speaker','StTime','EnTime'])['Text'].apply(' '.join).reset_index()


Example:

Name  Speaker StTime    Text              EnTime
s1     tom     6.8     I would say  7.3
s1     tom     7.3                      7.6
s1     tom     7.6     leap frog    8.3
s1     tom     8.3                      9.2
s1     tom     9.2       a pig.         10.1




Name  Speaker StTime     Text                            EnTime
s1     tom     6.8     I would say leap frog a pig.       10.1

Upvotes: 1

Views: 656

Answers (2)

jezrael
jezrael

Reputation: 863196

Use GroupBy.agg with added functions GroupBy.first and GroupBy.last and for column Text is use custom lambda function with filter out empty string:

df1 = (df.groupby(['Name','Speaker'], sort=False)
         .agg({'StTime':'first', 
               'Text': lambda x: ' '.join(y for y in x if y != ''),
               'EnTime':'last'})
         .reset_index())
print (df1)
  Name Speaker  StTime                          Text  EnTime
0   s1     tom     6.8  I would say leap frog a pig.    10.1

Upvotes: 0

U13-Forward
U13-Forward

Reputation: 71610

Or use:

>>> df['Text'] = df.groupby(['Name', 'Speaker'])['Text'].transform(' '.join).str.split().str.join(' ')
>>> df2 = df.head(1)
>>> df2['EnTime'] = df['EnTime'].iloc[-1]
>>> df2
  Name Speaker  StTime                          Text  EnTime
0   s1     tom     6.8  I would say leap frog a pig.    10.1
>>> 

Upvotes: 1

Related Questions