Merging rows in a dataframe depending on another column

Question

I have extracted a pdf into a dataframe and would like to merge the rows if column B is the same speaker :

From :

  Index     Column B     Column C 
   1       'I am going'    Speaker A 
   2       'to the zoo'    Speaker A
   3       'I am going'    Speaker B 
   4       'home      '    Speaker B
   5       'I am going'    Speaker A 
   6       'to the park'   Speaker A

To :

  Index     Column B                    Column C 
   1       'I am going to the zoo '    Speaker A 
   2       'I am going home'           Speaker B
   3       'I am going to the park'    Speaker A

I tried using groupby but the order is important in the context of a pdf which is a speech.

jpp · Accepted Answer

You can use GroupBy + agg after creating a series identifying when Column C changes:

res = df.assign(key=df['Column C'].ne(df['Column C'].shift()).cumsum())\
        .groupby('key').agg({'Column C': 'first', 'Column B': ' '.join})\
        .reset_index()

print(res)

   key   Column C                    Column B
0    1  Speaker A   'I am going' 'to the zoo'
1    2  Speaker B   'I am going' 'home      '
2    3  Speaker A  'I am going' 'to the park'

Note the output has quotation marks, as per the input you have supplied. These won't show if the strings are defined without quotes.

Merging rows in a dataframe depending on another column

Answers (2)

Related Questions