Milano
Milano

Reputation: 18735

Regex use in BeautifulSoup's find

I'm trying to use regex in a find command. I want to find a tag span which text contains 'example'.

I've already tried this:

place = infoFrame.find('span',text = re.compile('.*example.*:'))

I'm getting this error:

UnboundLocalError: local variable 're' referenced before assignment

Which is quite weird because I have import re written on the top of the page. The line I wrote above is in a function of a class.

I know that it is another way - find all span tags an then check for each tag whether it contains 'example' but I'm curious how to do it using regex inside the find command.

Can you give me an advice what is wrong?

Upvotes: 1

Views: 210

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122242

Elsewhere in your function you are using re = ... or import re. E.g. you are using re as a local variable too. Rename or remove that local variable:

>>> from bs4 import BeautifulSoup
>>> import re
>>> soup = BeautifulSoup('<span>example: foo</span>')
>>> def find_span():
...     return soup.find('span', text=re.compile('.*example.*:'))
...     re = 'oops'
... 
>>> find_span()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in find_span
UnboundLocalError: local variable 're' referenced before assignment
>>> def find_span():
...     return soup.find('span', text=re.compile('.*example.*:'))
... 
>>> find_span()
<span>example: foo</span>

Your use of re.compile() is otherwise fine; you can remove the first .* pattern though, and avoid the catastrophic backtracking problem. For any element with a lot of text and no example text in it, the pattern will be exceedingly slow otherwise. Make the second * non-greedy by using ?:

place = infoFrame.find('span', text=re.compile('example.*?:'))

Upvotes: 1

Related Questions