Reputation: 1990
I have two types of addresses:
Unit 5, 123 Fake Street Drive
123 Fake St Dr, Unit 5
How can I use Python to compare the two addresses by the numbers?
For example:
Unit 5, 123 Fake Street Drive -> [5,123]
123 Fake St Dr, Unit 5 -> [123,5]
TRUE
123 Fake Street Drive -> [123]
123 Fake St Dr, Unit 5 -> [123,5]
FALSE
Unit 5, 155 Fake Street Drive -> [155,5]
123 Fake St Dr, Unit 5 -> [123,5]
FALSE
All I have now is:
if bool(set([int(s) for s in address.split() if s.isdigit()]) & set([int(s) for s in address2.split() if s.isdigit()])):
I want to find out if one list of numbers is the same as another list of numbers regardless of the order.
Upvotes: 1
Views: 95
Reputation: 140297
You just have to build set
s of extracted numbers and compare them with ==
. set
supports equality very well.
Another problem here is that str.split()
won't work well for instance for 5,
. So isdigit()
fails and your sets aren't equal.
Let me suggest re.findall
to find digits, put them into sets and compare, using \d+
or \b\d+\b
to avoid digits inside words (like N2P
for instance)
import re
address="Unit 5, 123 Fake Street Drive"
address2 = "123 Fake St Dr, Unit 5"
pattern = r"\b\d+\b"
print(set(re.findall(pattern,address))==set(re.findall(pattern,address2)))
This yields True
, whereas if I change/add/remove one number from one of the lists above, I get False
As suggested in comments, that above fails if there are repeated numbers in one string, and not in the other: we could have a false positive since set
clobbers duplicates.
If that's an issue, then replacing set
with collections.Counter
fixes that
collections.Counter(re.findall(pattern,address))==collections.Counter(re.findall(pattern,address2))
works too, Counter
is a dictionary and compares to other dictionaries.
Upvotes: 3
Reputation: 1
I recommend that you use a sorted list, not a set. The set cannot distinguish between "Unit 1, 1 street X" and "1 Street Y" - but a sorted list will do.
Upvotes: 0