Reputation: 317
I'm trying to find tag usin bs4, where text is in format: 'Firma: ...........'. The problem is that re.compile does not work for this at all. I can't find out what am I doing.
Here is the code of html:
<span class="date">
Firma:
<b>Agedr js</b>
</span>
Here is a code to find this tag:
re.DOTALL
attributes = soup.findAll('span', class_='date')
for attribute in attributes:
if attribute == re.compile('Firma: .*'):
firma = attribute.text
print firma
I suppose that I'm using some special character in the text 'Firma: ' but I can't find it. Where can be the problem?
EDIT: Ways doesn't work:
I try re.compile('Firma.*')
.
re.DOTALL
Switch if attribute == ...
to if attribute.contents[0] == ...
Upvotes: 0
Views: 4187
Reputation: 369394
The code is comparing the compiled pattern object with Tag
object. It will always fail.
>>> import re
>>> re.compile('a') == 'a' # PatternObject == str => always false
False
>>> re.compile('a').search('a')
<_sre.SRE_Match object at 0x0000000002933168>
>>> re.search('a', 'a')
<_sre.SRE_Match object at 0x00000000029331D0>
You should use PatternObject.search
(or re.search
) with str
(slight modified the pattern not to include space):
if re.compile('Firma:.*').search(attribute.text):
firma = attribute.text
print firma
But for this simple case, you'd better to use in
operator:
if 'Firma:' in attribute.text:
....
Upvotes: 1