Combining two REGEX in Python for compiling

Question

I'm using REGEX to compile a list of strings from an HTML document in Python. The strings are either found inside a td tag (SOME OF THE STRINGS COULD BE HERE) or inside a div tag (

SOME STRINGS COULD ALSO BE HERE

).

Since the order of the strings inside the final list should correspond to the order in which they appear inside the HTML document, I am looking for a REGEX that will allow me to compile all of these strings considering both possible cases.

I know how to do it individually with something that looks like:

FindStrings = re.compile('(?<=\)(.*?)(?=\)')
MyList = re.findall(FindStrings, str(mydocument))

for the first case, but would like to know the most efficient way to combine both cases inside a unique REGEX.

Federico Piazza · Accepted Answer

You can combine regex pattern by using regex OR. Btw, you don't need to use regex lookarounds.

You can use this regex:

(.+?)|(.+?)

Working demo

enter image description here

Match information

MATCH 1
1.  [4-37]  `SOME OF THE STRINGS COULD BE HERE`
MATCH 2
2.  [94-125]    `SOME STRINGS COULD ALSO BE HERE`
QUICK REFERENCE

Code:

>>> import re
>>> s = """SOME OF THE STRINGS COULD BE HERE
... SOME STRINGS COULD ALSO BE HERE
... """
>>> m = re.findall(r'(.+?)|(.+?)', s)
>>> m
[('SOME OF THE STRINGS COULD BE HERE', ''), ('', 'SOME STRINGS COULD ALSO BE HERE')]
>>> [s for s in x if s for x in m]
['SOME STRINGS COULD ALSO BE HERE', 'SOME STRINGS COULD ALSO BE HERE']

Combining two REGEX in Python for compiling

Answers (2)

Related Questions