Reputation: 21233
I have two data frames df1 and df2. One looks like
Surname Knownas TB
0 K S 79.3
1 H E 79.1
2 I S 78.3
3 P B 78.2
4 W A 78.1
The other ones looks like
Mathematics Name
0 A* H,E
1 A* P,E
2 A* L,J
3 A* W,D
4 A C,K
I would like to join these two data frames but there is a problem.
I would like to use Name as the key for df2 and but for df1 I need to concatenate the fields Surname and Knownas with a comma in between to use that as the key. So in other words, the keys from df1 would be "K,S" "T,J" "I,S" and so on.
I have read and reread the manual but I can't see how to do this.
Upvotes: 1
Views: 54
Reputation: 210842
I would expand the Name
column into two columns (Surname
and Knownas
) and merge using Surname
and Knownas
columns in both DFs:
import six
import pandas as pd
data = """\
Surname Knownas TB
0 K S 79.3
1 T J 79.1
2 I S 78.3
3 P B 78.2
4 W A 78.1
"""
df1 = pd.read_csv(six.StringIO(data), sep='\s+', index_col=0)
print(df1)
data = """\
Mathematics Name
0 A* H,E
1 A* P,E
2 A* L,J
3 A* W,D
4 A C,K
5 A K,S
"""
df2 = pd.read_csv(six.StringIO(data), sep='\s+', index_col=0)
print(df2)
df2[['Surname', 'Knownas']] = df2.Name.str.split(',', expand=True)
print(df2)
merge = pd.merge(df1, df2, on=['Surname','Knownas'])
print(merge)
Output:
Surname Knownas TB
0 K S 79.3
1 T J 79.1
2 I S 78.3
3 P B 78.2
4 W A 78.1
Mathematics Name
0 A* H,E
1 A* P,E
2 A* L,J
3 A* W,D
4 A C,K
5 A K,S
Mathematics Name Surname Knownas
0 A* H,E H E
1 A* P,E P E
2 A* L,J L J
3 A* W,D W D
4 A C,K C K
5 A K,S K S
Surname Knownas TB Mathematics Name
0 K S 79.3 A K,S
Alternatively, you can create Name
column in DF1 and merge both DFs using Name
column:
df1['Name'] = df1.Surname + ',' + df1.Knownas
merge = pd.merge(df1, df2, on=['Name'])
PS i have intentionally added row5 to the second data frame so now at least one row can be matched
Upvotes: 1