Maciek
Maciek

Reputation: 1990

Compare the numbers within different string using Python

I have two types of addresses:

Unit 5, 123 Fake Street Drive
123 Fake St Dr, Unit 5

How can I use Python to compare the two addresses by the numbers?

For example:

Unit 5, 123 Fake Street Drive -> [5,123]
123 Fake St Dr, Unit 5 -> [123,5]

TRUE

123 Fake Street Drive -> [123]
123 Fake St Dr, Unit 5 -> [123,5]

FALSE

Unit 5, 155 Fake Street Drive -> [155,5]
123 Fake St Dr, Unit 5 -> [123,5]

FALSE

All I have now is:

if bool(set([int(s) for s in address.split() if s.isdigit()]) & set([int(s) for s in address2.split() if s.isdigit()])):

I want to find out if one list of numbers is the same as another list of numbers regardless of the order.

Upvotes: 1

Views: 95

Answers (2)

Jean-François Fabre
Jean-François Fabre

Reputation: 140297

You just have to build sets of extracted numbers and compare them with ==. set supports equality very well.

Another problem here is that str.split() won't work well for instance for 5,. So isdigit() fails and your sets aren't equal.

Let me suggest re.findall to find digits, put them into sets and compare, using \d+ or \b\d+\b to avoid digits inside words (like N2P for instance)

import re

address="Unit 5, 123 Fake Street Drive"
address2 = "123 Fake St Dr, Unit 5"

pattern = r"\b\d+\b"
print(set(re.findall(pattern,address))==set(re.findall(pattern,address2)))

This yields True, whereas if I change/add/remove one number from one of the lists above, I get False

As suggested in comments, that above fails if there are repeated numbers in one string, and not in the other: we could have a false positive since set clobbers duplicates.

If that's an issue, then replacing set with collections.Counter fixes that

collections.Counter(re.findall(pattern,address))==collections.Counter(re.findall(pattern,address2))

works too, Counter is a dictionary and compares to other dictionaries.

Upvotes: 3

John Bograd
John Bograd

Reputation: 1

I recommend that you use a sorted list, not a set. The set cannot distinguish between "Unit 1, 1 street X" and "1 Street Y" - but a sorted list will do.

Upvotes: 0

Related Questions