Why does the code matches only first and last match rather than all?

Question

I am trying to read phone nos from this file (below) having multiple phone nos using regex

import re
import pandas as pd  

url = "https://raw.githubusercontent.com/CoreyMSchafer/code_snippets/master/Python-Regular-Expressions/data.txt"
 # the file has multiple phone nos.

address = str(pd.read_fwf(url,header=None))
phoneno = re.compile(r"\d\d\d[-.]\d\d\d[-.]\d\d\d\d") # phone nos

# this creates a variable
matches = phoneno.finditer(address)

for match in matches:
    print(match)

My expected output was multiple matches but it gives just 2 matches

Wiktor Stribiżew · Accepted Answer

The issue is that when you use str(df) the result is truncated to display just some of the rows:

>>> address = str(pd.read_fwf(url,header=None))
>>> print(address)
                                           0
0                                Dave Martin
1                               615-555-7164
2         173 Main St., Springfield RI 55924
3                  davemartin@bogusemail.com
4                             Charles Harris
..                                       ...
395                johnstuart@bogusemail.com
396                           Charles Miller
397                             900-555-6426
398  207 Washington St., Blackwater MA 24886
399             charlesmiller@bogusemail.com

[400 rows x 1 columns]

This string only contains two matches, just what you get.

You can get them using

data = pd.read_fwf(url,header=None)
matches = list(filter(phoneno.fullmatch, data[0]))
>>> matches
# => ['615-555-7164', '800-555-5669', '560-555-5153', '900-555-9340', '714-555-7405', '800-555-6771', '783-555-4799', '516-555-4615', '127-555-1867', '608-555-4938', '568-555-6051', '292-555-1875', '900-555-3205', '614-555-1166', '530-555-2676', '470-555-2750', '800-555-6089', '880-555-8319', '777-555-8378', '998-555-7385', '800-555-7100', '903-555-8277', '196-555-5674', '900-555-5118', '905-555-1630', '203-555-3475', '884-555-8444', '904-555-8559', '889-555-7393', '195-555-2405', '321-555-9053', '133-555-1711', '900-555-5428', '760-555-7147', '391-555-6621', '932-555-7724', '609-555-7908', '800-555-8810', '149-555-7657', '130-555-9709', '143-555-9295', '903-555-9878', '574-555-3194', '496-555-7533', '210-555-3757', '900-555-9598', '866-555-9844', '669-555-7159', '152-555-7417', '893-555-9832', '217-555-7123', '786-555-6544', '780-555-2574', '926-555-8735', '895-555-3539', '874-555-3949', '800-555-2420', '936-555-6340', '372-555-9809', '890-555-5618', '670-555-3005', '509-555-5997', '721-555-5632', '900-555-3567', '147-555-6830', '582-555-3426', '400-555-1706', '525-555-1793', '317-555-6700', '974-555-8301', '800-555-3216', '746-555-4094', '922-555-1773', '711-555-4427', '355-555-1872', '852-555-6521', '691-555-5773', '332-555-5441', '900-555-7755', '379-555-3685', '127-555-9682', '789-555-7032', '783-555-5135', '315-555-6507', '481-555-5835', '365-555-8287', '911-555-7535', '681-555-2460', '274-555-9800', '800-555-1372', '300-555-7821', '133-555-3889', '705-555-6863', '215-555-9449', '988-555-6112', '623-555-3006', '192-555-4977', '178-555-4899', '952-555-3089', '900-555-6426']

All the phone numbers are separate items in the column. Hence, all you need is get all those items that fully match your pattern.

You may also improve the regex a bit by declaring it as

phoneno = re.compile(r"\d{3}[-.]\d{3}[-.]\d{4}")

The .fullmatch method only returns true if the whole string matches the regex pattern.

Why does the code matches only first and last match rather than all?

Answers (2)

Related Questions