Reputation: 349
I have data where I may have different people associated with the same entry.
I need to combine the two entries together and note that two people are on it.
For example, the data may look like:
Name Share_ID value1 value2 value3 etc.
Joe 0001 1 2 4
Ann 0002 2 5 2
Mel 0001 1 2 4
The output would need to be:
Name Share_ID value1 value2 value3 etc.
Joe, Mel 0001 1 2 4
Ann 0002 2 5 2
I tried to use groupby
df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index()
But my result from that was just:
Share_ID Name
0001 Joe, Mel
0002 Ann
The Name column combined correctly, but I lost the other columns. Note that I do not want the other columns to have anything applied to them--Joe and Ann's records are identical.
I think my approach is off, but I'm not sure what function to use.
Upvotes: 1
Views: 1233
Reputation: 51
Any particular reason for not using values
fields in group by?
df1.groupby(['Share_ID','value1', 'value2', 'value3'])['Name'].apply(', '.join).reset_index()
This will give the required output.
Upvotes: 0
Reputation: 76346
You can take the outcome you got, merge it with the original dataframe, and drop duplicates:
pd.merge(df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index(), df1, on='Share_ID').drop_duplicates(subset='Share_ID')
Upvotes: 1
Reputation: 2017
Starting where you left off you could just join your resulting data set back to the initial DataFrame:
# Find the merged name data set and rename the 'Name' column
names = df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index().rename(columns={'Name':'Merged Name'})
# Join it to the original dataset
df1 = df1.merge(names, on='Share_ID')
# Drop the 'Name' column then drop duplicates.
df1 = df1.drop(columns=['Name']).drop_duplicates()
Upvotes: 1