Find all matches between two strings with regex

Question

I am just starting to use regex for the first time and am trying to use it to parse some data from an HTML table. I am trying to grab everything between the and tags, and then make a similar regex again to create a JSON array.

I tried using this but it only is matching to the first group and not all of the rest.

(.*?)

How do I make that find all matches between those tags?

zx81 · Accepted Answer

Although using regex for this job is a bad idea (there are many ways for things to go wrong), your pattern is basically correct.

Returning All Matches with Python

The question then becomes about returning all matches or capture groups in Python. There are two basic ways:

finditer
findall

With finditer

for match in regex.finditer(subject):
    print("The Overall Match: ", match.group(0))
    print("Group 1: ", match.group(1))

With findall

findall is a bit strange. When you have capture groups, to access both the capture groups and the overall match, you have to wrap your original regex in parentheses (so that the overall match is captured too). In your case, if you wanted to be able to access both the outside of the tags and the inside (which you captured with Group 1), your regex would become: ((.*?)). Then you do:

matches = regex.findall(subject)
if len(matches)>0:
    for match in matches:
        print ("The Overall Match: ",match[0])
        print ("Group 1: ",match[1])

Find all matches between two strings with regex

Answers (2)

Related Questions