Simd
Simd

Reputation: 21233

How to join two tables while merging column names

I have two data frames df1 and df2. One looks like

  Surname Knownas        TB
0   K      S             79.3
1   H      E             79.1
2   I      S             78.3
3   P      B             78.2
4   W      A             78.1

The other ones looks like

  Mathematics           Name
0          A*           H,E
1          A*           P,E 
2          A*           L,J 
3          A*           W,D 
4          A            C,K    

I would like to join these two data frames but there is a problem.

I would like to use Name as the key for df2 and but for df1 I need to concatenate the fields Surname and Knownas with a comma in between to use that as the key. So in other words, the keys from df1 would be "K,S" "T,J" "I,S" and so on.

I have read and reread the manual but I can't see how to do this.

Upvotes: 1

Views: 54

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

I would expand the Name column into two columns (Surname and Knownas) and merge using Surname and Knownas columns in both DFs:

import six
import pandas as pd

data = """\
  Surname Knownas        TB
0   K      S             79.3
1   T      J             79.1
2   I      S             78.3
3   P      B             78.2
4   W      A             78.1
"""

df1 = pd.read_csv(six.StringIO(data), sep='\s+', index_col=0)
print(df1)

data = """\
Mathematics           Name
0          A*           H,E
1          A*           P,E 
2          A*           L,J 
3          A*           W,D 
4          A            C,K
5          A            K,S
"""
df2 = pd.read_csv(six.StringIO(data), sep='\s+', index_col=0)
print(df2)
df2[['Surname', 'Knownas']] = df2.Name.str.split(',', expand=True)
print(df2)

merge = pd.merge(df1, df2, on=['Surname','Knownas'])
print(merge)

Output:

  Surname Knownas    TB
0       K       S  79.3
1       T       J  79.1
2       I       S  78.3
3       P       B  78.2
4       W       A  78.1
  Mathematics Name
0          A*  H,E
1          A*  P,E
2          A*  L,J
3          A*  W,D
4           A  C,K
5           A  K,S
  Mathematics Name Surname Knownas
0          A*  H,E       H       E
1          A*  P,E       P       E
2          A*  L,J       L       J
3          A*  W,D       W       D
4           A  C,K       C       K
5           A  K,S       K       S
  Surname Knownas    TB Mathematics Name
0       K       S  79.3           A  K,S

Alternatively, you can create Name column in DF1 and merge both DFs using Name column:

df1['Name'] = df1.Surname + ',' + df1.Knownas
merge = pd.merge(df1, df2, on=['Name'])

PS i have intentionally added row5 to the second data frame so now at least one row can be matched

Upvotes: 1

Related Questions