Reputation: 4255
I have the following regex expression:
import re
re.compile('|'.join([pattern1, pattern2, pattern3]))
I would like it to work in the following way:
pattern1
; if matched - stop; else - proceed.pattern2
; if matched - stop; else - proceed.pattern3
; stop.However currently it matches all of them.
I found this Q/A, which I thought answers my question, but adding flags=re.I
does not fix my issue, since my result does not change.
How is this possible (if at all)?
A reproducible example:
from bs4 import BeautifulSoup
xml_doc = """
<m3_commodity_group commodity3="Oilseeds"><m3_year_group_Collection><m3_year_group market_year3="2011/12"><m3_month_group_Collection><m3_month_group forecast_month3=""><m3_attribute_group_Collection><m3_attribute_group attribute3="Output"><Textbox40><Cell cell_value3="353.93"/></Textbox40></m3_attribute_group><m3_attribute_group attribute3="Total
Supply"><Textbox40><Cell cell_value3="429.49"/></Textbox40></m3_attribute_group><m3_attribute_group attribute3="Trade"><Textbox40><Cell cell_value3="73.59"/></Textbox40></m3_attribute_group><m3_attribute_group attribute3="Total
Use 2/"><Textbox40><Cell cell_value3="345.49"/></Textbox40></m3_attribute_group><m3_attribute_group attribute3="Ending
Stocks"><Textbox40><Cell cell_value3="59.03"/></Textbox40></m3_attribute_group></m3_attribute_group_Collection><m3_value_group_Collection><m3_value_group><m3_attribute_group_Collection><m3_attribute_group attribute3="Output"><Textbox40><Cell Textbox44="filler"/></Textbox40></m3_attribute_group><m3_attribute_group attribute3="Total
Supply"><Textbox40><Cell Textbox44="filler"/></Textbox40></m3_attribute_group><m3_attribute_group attribute3="Trade"><Textbox40><Cell Textbox44="filler"/></Textbox40></m3_attribute_group><m3_attribute_group attribute3="Total
Use 2/"><Textbox40><Cell Textbox44="filler"/></Textbox40></m3_attribute_group><m3_attribute_group attribute3="Ending
Stocks"><Textbox40><Cell Textbox44="filler"/></Textbox40></m3_attribute_group></m3_attribute_group_Collection></m3_value_group></m3_value_group_Collection></m3_month_group></m3_month_group_Collection></m3_year_group></m3_year_group_Collection></m3_commodity_group>
"""
soup = BeautifulSoup(xml_doc, "xml")
# This gives 11 vales.
len(soup.find_all(re.compile('|'.join([
r'^m[0-9]_commodity_group$',r'^m[0-9]_region_group$',r'^m[0-9]_attribute_group$'
]), flags=re.I)))
# This gives 1 value <-- It's what I want, but I want to achieve it with the regex from above (which would work for other texts)
len(soup.find_all(re.compile('|'.join([
r'^m[0-9]_commodity_group$'
]), flags=re.I)))
# This gives 10 values, which in this example I'd like to be ignored, since the first regex already gave results.
len(soup.find_all(re.compile('|'.join([
r'^m[0-9]_attribute_group$'
]), flags=re.I)))
Upvotes: 1
Views: 349
Reputation: 184
Instead of compiling all the regex's together you could iterate through the list with a for loop, and breaking if you find a match.
regexList = ['[abc]', '[def]', '[ghi]']
text = input()
for r in regexList:
mo = re.findall(r, text)
if mo:
break
If you only want to find 1 result from the regex, then you can use the search
function in the re
python package. This package is built in to the standard python libraries.
regexList = ['[abc]', '[def]', '[ghi]']
text = input()
for r in regexList:
mo = re.search(r, text)
if mo:
break
Upvotes: 0
Reputation: 1884
You could restructure your search:
patterns = [r'^m[0-9]_commodity_group$',r'^m[0-9]_region_group$',r'^m[0-9]_attribute_group$']
for pattern in patterns:
result = soup.find_all(re.compile(pattern, flags=re.I))
if result:
break # Stop after the first time you found a match
else:
result = None # When there never was a match
That might be more reabable than regex magic. If you will be executing this a lot, you might want to pre-compile your regexes once instead of at every loop iteration.
Upvotes: 4