Reputation: 185
I have an Input Dataframe that the following :
NAME TEXT
Tim Tim Wagner is a teacher.
Tim He is from Cleveland, Ohio.
Frank Frank is a musician.
Tim He like to travel with his family
Frank He is a performing artist who plays the cello.
Frank He performed at the Carnegie Hall last year.
Frank It was fantastic listening to him.
I want to concatenate TEXT column if the consecutive rows of NAME column have the same value.
Output Dataframe:
NAME TEXT
Tim Tim Wagner is a teacher. He is from Cleveland, Ohio.
Frank Frank is a musician
Tim He like to travel with his family
Frank He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him.
Is using pandas shift, the best way to do this? Appreciate any help
thanks
Upvotes: 4
Views: 895
Reputation: 153560
Try:
grp = (df['Name'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT']\
.agg(' '.join).reset_index().drop('group', axis=1)
Output:
NAME TEXT
0 Tim Tim Wagner is a teacher. He is from Cleveland,...
1 Frank Frank is a musician
2 Tim He likes to travel with his family
3 Frank He is a performing artist who plays the cello....
Upvotes: 5
Reputation: 108
I went row by row and just created a new DataFrame.
import pandas as pd
df = pd.DataFrame([['Tim', 'Tim Wagner is a teacher.'],
['Tim', 'He is from Cleveland, Ohio.'],
['Frank', 'Frank is a musician'],
['Tim ', 'He likes to travel with his family'],
['Frank', 'He is a performing artist who plays the cello.'],
['Frank', 'He performed at the Carnegie Hall last year'],
['Frank', 'It was fantastic listening to him']], columns=['NAME', 'TEXT'])
col = ""
txt = ""
arr = []
con_ind = 0
for i, row in df.iterrows():
if col == row['NAME']:
txt += ' ' + row['TEXT']
else :
if (i != 0):
arr.append([col, txt])
col = row['NAME']
txt = row['TEXT']
if (txt != row['TEXT']):
arr.append([col, txt])
print(pd.DataFrame(arr))
Upvotes: 0