Reputation: 45
I have two diffent dataframes with two columns and i want to merge them + get them sum of column B. The problem is dataframe 1 have some data, that i want to keep. I'll write an example so it make sense
Dataframe 1
Columns A Column B
House walls,doors,rooms
Animal Legs,nose,eyes
car tires,engine
Dataframe 2
Column A Column B
House windows,kitchen
Bike wheels,bicycle chain
Desired result
Column A Column B
House walls,doors,rooms,windows,kitchen
Animal Legs,nose,eyes
Car tires,engine
Bike wheels,bicycle chain
The merge function doesnt help and i tried to use pd.concat and then somehow aggregate data but didnt help either. Someone got an idea of how to solve it?
Upvotes: 3
Views: 63
Reputation: 1281
pd.concat([df1, df2]).groupby("Column A")["Column B"].apply(', '.join).reset_index()
After concating your dataframes, group your values by Column A, then use apply
to concat the grouped strings in column B, and finally restore Column A with reset_index()
.
Edit: expansion on comments
To remove duplicates, you can use the set
data structure, which only keeps a single version of each element you put into it. For each row x, split the words, then convert the list of words into a set:
df4 = df3["Column B"].apply(lambda x: set(x.split(", "))).reset_index()
Note that after this, your column B will contain sets. I'll let you figure out how to reconvert from a set to a string using a similar pattern.
Upvotes: 4