Reputation: 11163
I have 2 databases in excel. In database A, I have the names of various companies, cities, and charities. Database B is the same. However Database B is filled out by the customer. As such, I get a lot of random mistakes and/or variations on the legal name.
What is the best way to match the names?
Here are some sample differences:
Is there a good solution to this? I know this question is a bit of a long shot, but if I can do this I will have saved people in my company like thousands of hours each year...
Any advice will be greatly appreciated
Upvotes: 0
Views: 350
Reputation: 28761
This is a very complex problem. Look up "master data management" and "dedup". This wikipedia article is a good starting point.
The problem is best solved in small chunks. My recommendation is to read up a little and implement a tool that lists potential duplicates and some easy way to merge them. The keyword here is potential; you don't want to do wrong merges and false positives are very likely and very harmful.
Upvotes: 2
Reputation: 569
You could use regular expressions to filter these databases.
http://en.wikipedia.org/wiki/Regular_expression
http://www.zytrax.com/tech/web/regex.htm
You can have a program pattern match based on the relevant part of a company name. For example, If someone puts in Microsoft Corporation of Redmond, and your program pattern matches against 'Microsoft' you'd get a hit.
Upvotes: 0