Rollo99
Rollo99

Reputation: 1613

How to use groupby in Python to merge text while keeping the other rows fixed?

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'Date':['2022-01-01', '2022-01-01','2022-01-01','2022-02-01','2022-02-01',
                      '2022-03-01','2022-03-01','2022-03-01'],
              'Type': ['R','R','R','P','P','G','G','G'],
              'Class':[1,1,1,0,0,2,2,2],
              'Text':['Hello-','I would like.','to be merged.','with all other.',
                      'sentences that.','belong to my same.','group.','thanks a lot.']})

df.index =[1,1,1,2,2,3,3,3]

What I would like to do is grouping by the index to join the column of the text while keeping only the first row for the other columns.

I tried the following two solutions without success. Probably I should combine them but I have no idea on how to do it.

# Approach 1
df.groupby([df.index],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))

# Approach 2
df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Test': 'join'})

The outcome should be:


Date          Type   Class   Text
2022-01-01     R      1      Hello. I would like to be merged.
2022-02-01     P      0      with all other sentences that.
2022-03-01     G      2      belong to my same. group. thanks a lot.

Can anyone help me do it?

Thanks!

Upvotes: 1

Views: 111

Answers (1)

JANO
JANO

Reputation: 3076

My idea would be to take the second approach and aggregate the text to a list and then simply join the individual strings like this:

new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Text': list})
new_df['Text'] = new_df['Text'].str.join('')
print(new_df)

Output:


Date    Type    Class   Text
0   2022-01-01  R   1   Hello-I would like.to be merged.
1   2022-02-01  P   0   with all other.sentences that.
2   2022-03-01  G   2   belong to my same.group.thanks a lot.

Found out you can do it in a single statement as well (same approach):

new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Text': ''.join})

Upvotes: 1

Related Questions