How do I extract the span and match from a regex search?

Question

Suppose I have the following data:

some_string = """
Dave Martin
615-555-7164
173 Main St., Springfield RI 559241122
davemartin101@exampledomain.com

Charles Harris
800-555-5669
969 High St., Atlantis VA 340750509
charlesharris101@exampledomain.com
"""

I used the following to find a pattern:

import re
pattern = re.compile(r'\d\d\d(-|\.)\d\d\d(-|\.)\d\d\d\d')
matches = pattern.finditer(some_string)

Printing the re object shows:

for match in matches:
    print(match)

I want to extract the span and match fields. I found this link Extract part of a regex match that shows how to use group():

nums = []
for match in matches:
    nums.append(match.group(0))

I get the following result:

print(nums)
['615-555-7164', '800-555-5669']

Similar to the other StackOverlow thread above, how can I extract the span?

This thread was marked for deletion by someone and then it was deleted. The justification for deletion was that I was seeking advice on software... which I was not. https://i.imgur.com/sbCfekf.png

PIG208 · Accepted Answer

If you are just looking for the tuple storing the begin and end index of the matches, just use span. Note that the parameter for span works the same way as group as they both take the match group index, and index 0 stores the entire match (while in your case index 1 and 2 match (-|\.)).

for match in matches:
    print(match.span(0))

Output:

(13, 25)
(103, 115)

And for extracting the match fields, yes, your approach works just fine. It will be better if you extract both the match fields and span in the same loop.

nums = []
spans = []
for match in matches:
    nums.append(match.group(0))
    spans.append(match.span(0))

Besides, please be aware that finditer gives you an Iterator, which means that once it reaches the end of the iterable, it's done. You will need to create a new one if you want to iterate it through again.

How do I extract the span and match from a regex search?

Answers (1)

Related Questions