Minduca
Minduca

Reputation: 1181

How two check if two unstructured street adresses strings are the same?

I need to compare two unstructured addresses and be able to identify if they are the same (or similar enough).

Scenario

What I have found

I know we can use some Fuzzy logic for this kind of comparison, with some tolerance for misspelling, but...

I do not want to reinvent the Wheel. This problem seems like a common concern in different contexts and I think there is an algorithm (with some slight modifications, maybe) that might be a fit for this scenario.

Thanks in advance

Upvotes: 4

Views: 3230

Answers (1)

fgregg
fgregg

Reputation: 3249

I've helped build some open source tools to do this.

Basically, the approach is to try to split and address into it's constituent parts and then intelligently compare those parts.

Both parts of the problem are hard.

The first part is often called address parsing. Here's what we use: https://github.com/datamade/usaddress

The second part has many, many names but, let's call it fuzzy matching. Here's the library we made for that: https://github.com/datamade/dedupe

We also provided some facilities for using them together: http://dedupe.readthedocs.io/en/latest/Variable-definition.html#address-type

Upvotes: 5

Related Questions