Reputation: 725
I am working on an assignment to find if a newly entering address already exists in my database. The goal is to prevent members with the same address to register again. I want to intelligently match the new address from the one existing in the database. For example,
"152 Skyline Drive" is similar to "152, skyln Dr" but "153 Skyline Drive' is different.
I am currently using similar_text()
for the same and it's matching the strings only and ignoring the simple fact that 152
and 153
is a different address. It's only giving me a matching percentage. There is a more negative side to this method than positive sides.
Can anyone help?
Upvotes: 1
Views: 588
Reputation: 4180
Your best bet would be to store different address components (house #, street, city, state, country) in separate fields and run your query against that.
You could build a heuristic to extract elements from a malformatted address. E.g.
However, knowing that "skyln Dr" and "skyline drive" is the same requires some machine learning power. Unless, you have a dataset available and your assignment explicitly requires that, do not go into that rabbithole:
How to build a robust street address parser using a Recurrent Neural Network
Upvotes: 2