user2518751
user2518751

Reputation: 735

In Python,if startswith values in tuple, I also need to return which value

I have an area codes file I put in a tuple

for line1 in area_codes_file.readlines():
    if area_code_extract.search(line1):
        area_codes.append(area_code_extract.search(line1).group())
area_codes = tuple(area_codes)

and a file I read into Python full of phone numbers. If a phone number starts with one of the area codes in the tuple, I need to do to things: 1 is to keep the number 2 is to know which area code did it match, as need to put area codes in brackets.

So far, I was only able to do 1:

for line in txt.readlines():
is_number = phonenumbers.parse(line,"GB")
if phonenumbers.is_valid_number(is_number):
    if line.startswith(area_codes):
        print (line)

How do I do the second part?

Upvotes: 1

Views: 621

Answers (1)

ShadowRanger
ShadowRanger

Reputation: 155506

The simple (if not necessarily highest performance) approach is to check each prefix individually, and keep the first match:

for line in txt:
    is_number = phonenumbers.parse(line,"GB")
    if phonenumbers.is_valid_number(is_number):
        if line.startswith(area_codes):
            print(line, next(filter(line.startswith, area_codes)))

Since we know filter(line.startswith, area_codes) will get exactly one hit, we just pull the hit using next.

Note: On Python 2, you should start the file with from future_builtins import filter to get the generator based filter (which will also save work by stopping the search when you get a hit). Python 3's filter already behaves like this.

For potentially higher performance, the way to both test all prefixes at once and figure out which value hit is to use regular expressions:

import re

# Function that will match any of the given prefixes returning a match obj on hit
area_code_matcher = re.compile(r'|'.join(map(re.escape, area_codes))).match
for line in txt:
    is_number = phonenumbers.parse(line,"GB")
    if phonenumbers.is_valid_number(is_number):
        # Returns None on miss, match object on hit
        m = area_code_matcher(line)
        if m is not None:
            # Whatever matched is in the 0th grouping
            print(line, m.group())

Lastly, one final approach you can use if the area codes are of fixed length. Rather than using startswith, you can slice directly; you know the hit because you sliced it off yourself:

# If there are a lot of area codes, using a set/frozenset will allow much faster lookup
area_codes_set = frozenset(area_codes)
for line in txt:
    is_number = phonenumbers.parse(line,"GB")
    if phonenumbers.is_valid_number(is_number):
        # Assuming lines that match always start with ###
        if line[:3] in area_codes_set:
            print(line, line[:3])

Upvotes: 1

Related Questions