Shivendra
Shivendra

Reputation: 1096

Python RegEx with Beautifulsoup 4 not working

I want to find all div tags which have a certain pattern in their class name but my code is not working as desired.

This is the code snippet

soup = BeautifulSoup(html_doc, 'html.parser')

all_findings = soup.findAll('div',attrs={'class':re.compile(r'common text .*')})

where html_doc is the string with the following html

<div class="common text sighting_4619012">

  <div class="hide-c">
    <div class="icon location"></div>
    <p class="reason"></p>
    <p class="small">These will not appear</p>
    <span class="button secondary ">wait</span>
  </div>

  <div class="show-c">
  </div>

</div>

But all_findings is coming out as an empty list while it should have found one item.

It's working in the case of exact match

all_findings = soup.findAll('div',attrs={'class':re.compile(r'hide-c')})

I am using bs4.

Upvotes: 1

Views: 597

Answers (2)

alecxe
alecxe

Reputation: 474141

To extend @Andy's answer, you can make a list of class names and compiled regular expressions:

soup.find_all('div', {'class': ["common", "text", re.compile(r'sighting_\d{5}')]})

Note that, in this case, you'll get the div elements with one of the specified classes/patterns - in other words, it's common or text or sighting_ followed by five digits.

If you want to have them joined with "and", one option would be to turn off the special treatment for "class" attributes by having the document parsed as "xml":

soup = BeautifulSoup(html_doc, 'xml')
all_findings = soup.find_all('div', class_=re.compile(r'common text sighting_\d{5}'))
print all_findings

Upvotes: 0

Andy
Andy

Reputation: 50620

Instead of using a regular expression, put the classes you are looking for in a list:

all_findings = soup.findAll('div',attrs={'class':['common', 'text']})

Example code:

from bs4 import BeautifulSoup

html_doc = """<div class="common text sighting_4619012">

  <div class="hide-c">
    <div class="icon location"></div>
    <p class="reason"></p>
    <p class="small">These will not appear</p>
    <span class="button secondary ">wait</span>
  </div>

  <div class="show-c">
  </div>

</div>"""
soup = BeautifulSoup(html_doc, 'html.parser')
all_findings = soup.findAll('div',attrs={'class':['common', 'text']})
print all_findings

This outputs:

[<div class="common text sighting_4619012">
<div class="hide-c">
<div class="icon location"></div>
<p class="reason"></p>
<p class="small">These will not appear</p>
<span class="button secondary ">wait</span>
</div>
<div class="show-c">
</div>
</div>]

Upvotes: 2

Related Questions