Yaitzme
Yaitzme

Reputation: 241

Concatenate a set of column values based on another column in Pandas

Given a Pandas dataframe which has a few labeled series in it, say Name and Villain.

Say the dataframe has values such:
Name: {'Batman', 'Batman', 'Spiderman', 'Spiderman', 'Spiderman', 'Spiderman'}
Villain: {'Joker', 'Bane', 'Green Goblin', 'Electro', 'Venom', 'Dr Octopus'}

In total the above dataframe has 2 series(or columns) each with six datapoints.

Now, based on the Name, I want to concatenate 3 more columns: FirstName, LastName, LoveInterest to each datapoint.

The result of which adds 'Bruce; Wayne; Catwoman' to every row which has Name as Batman. And 'Peter; Parker; MaryJane' to every row which has Name as Spiderman.

The final result should be a dataframe containing 5 columns(series) and 6 rows each.

Upvotes: 6

Views: 11068

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95908

This is a classic inner-join scenario. In pandas, use the merge module-level function:

In [13]: df1
Out[13]: 
        Name       Villain
0     Batman         Joker
1     Batman          Bane
2  Spiderman  Green Goblin
3  Spiderman       Electro
4  Spiderman         Venom
5  Spiderman   Dr. Octopus

In [14]: df2
Out[14]: 
  FirstName LastName LoveInterest       Name
0     Bruce    Wayne     Catwoman     Batman
1     Peter   Parker     MaryJane  Spiderman

In [15]: pd.DataFrame.merge(df1,df2,on='Name')
Out[15]: 
        Name       Villain FirstName LastName LoveInterest
0     Batman         Joker     Bruce    Wayne     Catwoman
1     Batman          Bane     Bruce    Wayne     Catwoman
2  Spiderman  Green Goblin     Peter   Parker     MaryJane
3  Spiderman       Electro     Peter   Parker     MaryJane
4  Spiderman         Venom     Peter   Parker     MaryJane
5  Spiderman   Dr. Octopus     Peter   Parker     MaryJane

Upvotes: 10

Related Questions