user14262559
user14262559

Reputation: 185

Conditionally merge consecutive rows of a pandas dataframe

I have an Input Dataframe that the following :

NAME    TEXT
Tim     Tim Wagner is a teacher.
Tim     He is from Cleveland, Ohio.
Frank   Frank is a musician.
Tim     He like to travel with his family
Frank   He is a performing artist who plays the cello.
Frank   He performed at the Carnegie Hall last year.
Frank   It was fantastic listening to him.

I want to concatenate TEXT column if the consecutive rows of NAME column have the same value.

Output Dataframe:

NAME    TEXT
Tim     Tim Wagner is a teacher.  He is from Cleveland, Ohio.
Frank   Frank is a musician
Tim     He like to travel with his family
Frank   He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him.

Is using pandas shift, the best way to do this? Appreciate any help

thanks

Upvotes: 4

Views: 895

Answers (2)

Scott Boston
Scott Boston

Reputation: 153560

Try:

grp = (df['Name'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT']\
  .agg(' '.join).reset_index().drop('group', axis=1)

Output:

    NAME                                               TEXT
0    Tim  Tim Wagner is a teacher. He is from Cleveland,...
1  Frank                                Frank is a musician
2   Tim                  He likes to travel with his family
3  Frank  He is a performing artist who plays the cello....

Upvotes: 5

joesph nguyen
joesph nguyen

Reputation: 108

I went row by row and just created a new DataFrame.


import pandas as pd

df = pd.DataFrame([['Tim', 'Tim Wagner is a teacher.'],
['Tim', 'He is from Cleveland, Ohio.'],
['Frank', 'Frank is a musician'],
['Tim ', 'He likes to travel with his family'],
['Frank', 'He is a performing artist who plays the cello.'],
['Frank', 'He performed at the Carnegie Hall last year'],
['Frank', 'It was fantastic listening to him']], columns=['NAME', 'TEXT'])

col = ""
txt = ""
arr = []
con_ind = 0
for i, row in df.iterrows():
    if col == row['NAME']:
        txt += ' ' + row['TEXT']
    else :
        if (i != 0):
            arr.append([col, txt])
        col = row['NAME']
        txt = row['TEXT']
        
if (txt != row['TEXT']):
    arr.append([col, txt])


print(pd.DataFrame(arr))

Upvotes: 0

Related Questions