Mill
Mill

Reputation: 317

re.compile does not work properly

I'm trying to find tag usin bs4, where text is in format: 'Firma: ...........'. The problem is that re.compile does not work for this at all. I can't find out what am I doing.

Here is the code of html:

<span class="date">
    Firma:
    <b>Agedr js</b>
</span>

Here is a code to find this tag:

re.DOTALL
attributes = soup.findAll('span', class_='date')
        for attribute in attributes: 
            if  attribute == re.compile('Firma: .*'):
                firma = attribute.text
                print firma

I suppose that I'm using some special character in the text 'Firma: ' but I can't find it. Where can be the problem?

EDIT: Ways doesn't work:

I try re.compile('Firma.*').

re.DOTALL

Switch if attribute == ... to if attribute.contents[0] == ...

Upvotes: 0

Views: 4187

Answers (1)

falsetru
falsetru

Reputation: 369394

The code is comparing the compiled pattern object with Tag object. It will always fail.

>>> import re
>>> re.compile('a') == 'a'  # PatternObject == str  => always false
False
>>> re.compile('a').search('a')
<_sre.SRE_Match object at 0x0000000002933168>
>>> re.search('a', 'a')
<_sre.SRE_Match object at 0x00000000029331D0>

You should use PatternObject.search (or re.search) with str (slight modified the pattern not to include space):

if re.compile('Firma:.*').search(attribute.text):
    firma = attribute.text
    print firma

But for this simple case, you'd better to use in operator:

if 'Firma:' in attribute.text:
    ....

Upvotes: 1

Related Questions