d4rty
d4rty

Reputation: 4188

Combine/merge DataFrames with different indexes and different column names

I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge and join methods but I did not achieve the desired result.

import pandas as pd

d = {'A': [1, 1, 0, 1, 0, 1, 0],
     'B': [0, 0, 0, 0, 0, 1, 1]
     }
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)

d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
     'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
     'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


df2 = pd.DataFrame(data=d)
print(df2)

Console output:

   A  B
A  1  0
B  1  0
C  0  0
D  1  0
E  0  0
F  1  1
G  0  1

  A2  B2  C3
0  A  AA   1
1  A  AA  11
2  B  BB  35
3  C  CC  53
4  C  CC   2
5  E  EE  76
6  X  XX  45
7  F  FF   5
8  G  GG  34

I'm looking for a way to compute the following: Via the index of df I can look up in column A2 of df2 the value of B2 which should be added to df.

Desired result:

   A  B  B2
A  1  0  AA
B  1  0  BB
C  0  0  CC
D  1  0  DD
E  0  0  EE
F  1  1  FF
G  0  1  GG

(This is only dummy data, just duplicating the index and write it in column B2 of df is not sufficient)

Upvotes: 4

Views: 1716

Answers (2)

daniel_hck
daniel_hck

Reputation: 1140

I know this has been already answered by W-B in a very elegant way.

However, since I have spent the time to solve this in a less professional way, let me post also my solution.

From:

I'm looking for a way to compute the following: Via the index of df I can look up in column A2 of df2 the value of B2 which should be added to df.

I understood I should do:

  1. get index list form df. So A, B, C...
  2. look values in df2['B2'] in the same index than df2['A2'] for each element of df index
  3. create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']

This is my code:

import pandas as pd

d = {'A': [1, 1, 0, 1, 0, 1, 0],
     'B': [0, 0, 0, 0, 0, 1, 1]
     }
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)

d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
     'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
     'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


df2 = pd.DataFrame(data=d)
print(df2)

llista=[]
for i in df.index:
    m=df2['A2'].loc[df2['A2']==i].index
    if m[0]:
        print m[0],i
        llista.append(df2['B2'].iloc[m[0]])
    else:
        llista.append([])
df['B2'] = llista

Output is:

   A  B  B2
A  1  0  AA
B  1  0  BB
C  0  0  CC
D  1  0  []
E  0  0  EE
F  1  1  FF
G  0  1  GG

As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']

Upvotes: 0

BENY
BENY

Reputation: 323226

set_index and assign it

df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
df
Out[728]: 
   A  B  B2
A  1  0  AA
B  1  0  BB
C  0  0  CC
D  1  0  DD
E  0  0  EE
F  1  1  FF
G  0  1  GG

Upvotes: 3

Related Questions