Reputation: 4924
I just hit a snag with regex and have no idea why this's not working.
Here is what BeautifulSoup doc says:
soup.find_all(class_=re.compile("itl"))
# [<p class="title"><b>The Dormouse's story</b></p>]
Here is my html:
<a href="exam.com" title="Keeper: Jay" class="pos_text">Aouate</a></span><span class="pos_text pos3_l_4">
and I'm trying to match the span
tag (last position).
>>> if soup.find(class_=re.compile("pos_text pos3_l_\d{1}")):
print "Yes"
# prints nothing - indicating there is no such pattern in the html
So, I'm just repeating the BS4 docs, except my regex is not working. Sure enough if I replace the \d{1}
with 4
(as originally in the html) it succeedes.
Upvotes: 3
Views: 165
Reputation: 6807
You are matching not for a class but for an specific combination of classes in an specific order.
From the documentation:
You can also search for the exact string value of the class attribute:
css_soup.find_all("p", class_="body strikeout")
# [<p class="body strikeout"></p>] But searching for variants of the string value won’t work:
css_soup.find_all("p", class_="strikeout body")
# []
So you should problable fist match for post_text and then in the result try to match with a regexp in the matches for that search
Upvotes: 1
Reputation: 27227
I'm not entirely sure, but this worked for me:
soup.find(attrs={'class':re.compile('pos_text pos3_l_\d{1}')})
Upvotes: 2
Reputation: 20747
Try "\\d" in your regex. It's probably interpreting "\d" as trying to escape 'd'.
Alternatively, a raw string ought to work. Just put an 'r' in front of the regex, like this:
re.compile(r"pos_text pos3_l_\d{1}")
Upvotes: 2