using regex on beautiful soup tags

Question

I have been recently using beautiful soup 4 and I have been struggling to understand some basics of this (I was quite ok with bs3.x for some reason). So, for example, lets start off by doing something simple like:

data=soup.find_all('h2')

which yields me something like:

more-accurate-data

which is fine. But when I want to regex the above string, using something along the lines off (assuming the above is stored in "temp"):

t=str(re.compile(r"""""").search(str(temp)).group(1))

I get:

AttributeError: 'NoneType' object has no attribute 'group'

which I find strange - because, when I do on the python interpretter, something like:

k=r"""more-accurate-data"""

and then use the above regex, everything works fine. I am wondering why the "tags" type generated by bs4 seems non regex'able. Now I feel maybe I am doing something stupid or maybe something has changed between bs3.x and bs4 which I am not aware of. Any help on this would be appreciated. Thanks.

Bakuriu · Accepted Answer

You should try to see the repr of the string:

>>> a=r"""more-accurate-data"""
>>> print repr(a)
'more-accurate-data'

And the regex works with this representation:

>>> regex = re.compile(r"""""")
>>> regex.match(a)
<_sre.SRE_Match object at 0x20fbf30>

The problem is that the result from beautiful soup is different, because you did not print its repr. When dealing with regexes it's a good idea to check the repr of the strings involved to avoid things like this.

using regex on beautiful soup tags

Answers (1)

Related Questions