Reputation: 59
I'm trying to merge two dataframes wherever df1['LineA']
matches with df2['LineA']
.
But for index 1 of df2
, this just provides me Line A: 2.
I'm having trouble obtaining the 3rd line of my desired output.
df1
PointA LineA PointB ID
0 A 2 D
1 A 3 K
2 B 2 F
3 C 3 M
df2
PointA LineA ID
0 A 2 129
1 A 2(3) 233
2 A 2 560
3 A 3 321
4 A 3 340
5 B 2 213
Desired Output:
PointA LineA PointB ID
0 A 2 D 129
1 A 2 D 233
2 A 2 D 560
3 A 3 K *233*
4 A 3 K 321
5 A 3 K 340
6 B 2 F 213
7 C 3 M
I've used the below code:
df2.LineA = df2.LineA.map(lambda x: difflib.get_close_matches(x, df1.LineA)[0])
Could someone point me in the right direction. Thanks
Upvotes: 1
Views: 45
Reputation: 6495
One way is to deal with the parentheses and make a new row using pandas.DataFrame.explode:
df1 = pd.DataFrame({'PointA':['A','A','B','C'],
'LineA': [2,3,2,3],
'LineB': ['D', 'K', 'F', 'M']})
df2 = pd.DataFrame({'PointA':['A','A','A','A','A','B'],
'LineA': [2, '2(3)', 2, 3,3, 2],
'ID': [129, 233, 560, 321, 340, 213]})
df1['LineA'] = df1['LineA'].astype(str)
df2['LineA'] = df2['LineA'].astype(str)
df2['LineA'] = df2['LineA'].str.replace(')', '').str.split(r'(')
df2 = df2.explode('LineA')
pd.merge(df1, df2, on=['PointA', 'LineA'], how='outer')
PointA LineA LineB ID
0 A 2 D 129.0
1 A 2 D 233.0
2 A 2 D 560.0
3 A 3 K 233.0
4 A 3 K 321.0
5 A 3 K 340.0
6 B 2 F 213.0
7 C 3 M NaN
Upvotes: 2