RoccoMaxamas
RoccoMaxamas

Reputation: 349

Combining text values in a pandas dataframe column based on same value in another column

I have data where I may have different people associated with the same entry.

I need to combine the two entries together and note that two people are on it.

For example, the data may look like:

Name Share_ID value1 value2 value3 etc.
Joe  0001     1      2      4
Ann  0002     2      5      2
Mel  0001     1      2      4

The output would need to be:

Name      Share_ID value1 value2 value3 etc.
Joe, Mel  0001     1      2      4
Ann       0002     2      5      2

I tried to use groupby

df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index()

But my result from that was just:

Share_ID Name
0001     Joe, Mel
0002     Ann

The Name column combined correctly, but I lost the other columns. Note that I do not want the other columns to have anything applied to them--Joe and Ann's records are identical.

I think my approach is off, but I'm not sure what function to use.

Upvotes: 1

Views: 1233

Answers (3)

mukulgarg94
mukulgarg94

Reputation: 51

Any particular reason for not using values fields in group by?

df1.groupby(['Share_ID','value1', 'value2', 'value3'])['Name'].apply(', '.join).reset_index()

This will give the required output.

Upvotes: 0

Ami Tavory
Ami Tavory

Reputation: 76346

You can take the outcome you got, merge it with the original dataframe, and drop duplicates:

pd.merge(df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index(), df1, on='Share_ID').drop_duplicates(subset='Share_ID')

Upvotes: 1

it's-yer-boy-chet
it's-yer-boy-chet

Reputation: 2017

Starting where you left off you could just join your resulting data set back to the initial DataFrame:

# Find the merged name data set and rename the 'Name' column
names = df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index().rename(columns={'Name':'Merged Name'})
# Join it to the original dataset
df1 = df1.merge(names, on='Share_ID')
# Drop the 'Name' column then drop duplicates.
df1 = df1.drop(columns=['Name']).drop_duplicates()

Upvotes: 1

Related Questions