Reputation: 55
This is a little complicated, but I basically need to align the data from a second dataframe to the values of the first dataframe(DF). The first DF has different versions of the right name and the second DF has the correct names. The final product should have the original names with a column next to it with the correct names.
The first DF will have many more values than the second. I don't think it's possible to automate the whole process, because the first DF can have so many variations. I want to just be able to match the exact matches and I'll do the rest manually. Maybe I will also do an apply function to check if removing () results in a direct match. But basically, I need to align exact matches next to each other. If it can't be done for every value, it would still be fine to match the ones that are possible and I could ffill or fill rest in manually. What would be the best way to tackle this? I must've used them wrong, but .concat and .merge were not very helpful.
data1 = {'Name': ['Fuel 1 1-2-1','Fuel(1) 1-2-1', 'Fuel(1) 1-2-2', 'Fuel 2 2-2-1', 'Storage 1 2-2-2','Storage (2) 1-2-2','Storage 2 1-2-2',
'Tank 2']}
Name
0 Fuel 1 1-2-1
1 Fuel(1) 1-2-1
2 Fuel(1) 1-2-2
3 Fuel 2 2-2-1
4 Storage 1 2-2-2
5 Storage (2) 1-2-2
6 Storage 2 1-2-2
7 Tank 2
data2 = {'Required Change': ['Fuel 1 1-2-1', "Fuel 1 1-2-2",
'Fuel 2-2-1','Storage 1 2-2-2','Storage 2 1-2-2', 'Tank 2', 'Tank 3']}
Required Change
0 Fuel 1 1-2-1
1 Fuel 1 1-2-2
2 Fuel 2-2-1
3 Storage 1 2-2-2
4 Storage 2 1-2-2
5 Tank 2
6 Tank 3
Need it to look something like this:
data3 = {'Name': ['Fuel 1 1-2-1','Fuel(1) 1-2-1', 'Fuel(1) 1-2-2', 'Fuel 2 2-2-1', 'Storage 1 2-2-2','Storage (2) 1-2-2','Storage 2 1-2-2',
'Tank 2'],
'Required Change': ['Fuel 1 1-2-1', 'Fuel 1 1-2-1', "Fuel 1 1-2-2", 'Fuel 2-2-1','Storage 1 2-2-2','Storage 2 1-2-2',
'Storage 2 1-2-2', 'Tank 2']
}
Name Required Change
0 Fuel 1 1-2-1 Fuel 1 1-2-1
1 Fuel(1) 1-2-1 Fuel 1 1-2-1
2 Fuel(1) 1-2-2 Fuel 1 1-2-2
3 Fuel 2 2-2-1 Fuel 2-2-1
4 Storage 1 2-2-2 Storage 1 2-2-2
5 Storage (2) 1-2-2 Storage 2 1-2-2
6 Storage 2 1-2-2 Storage 2 1-2-2
7 Tank 2 Tank 2
Upvotes: 1
Views: 138
Reputation: 93181
If what you want is to clean up data1
and find the exact match in data2
, you can try this:
data1["Cleaned"] = (
data1["Name"].str.replace(r"\(|\)", " ", regex=True)
.str.replace(" {2,}", " ", regex=True)
)
data1.merge(data2, how="left", left_on="Cleaned", right_on="Required Change")
Result:
Name Cleaned Required Change
0 Fuel 1 1-2-1 Fuel 1 1-2-1 Fuel 1 1-2-1
1 Fuel(1) 1-2-1 Fuel 1 1-2-1 Fuel 1 1-2-1
2 Fuel(1) 1-2-2 Fuel 1 1-2-2 Fuel 1 1-2-2
3 Fuel 2 2-2-1 Fuel 2 2-2-1 NaN # Can't find match here
4 Storage 1 2-2-2 Storage 1 2-2-2 Storage 1 2-2-2
5 Storage (2) 1-2-2 Storage 2 1-2-2 Storage 2 1-2-2
6 Storage 2 1-2-2 Storage 2 1-2-2 Storage 2 1-2-2
7 Tank 2 Tank 2 Tank 2
This solution replaces (
and )
with whitespace and collapse multiple white spaces into 1. Your real data possibly have more variations and you need to adjust the regex pattern accordingly.
Upvotes: 1