Peter Arsenault
Peter Arsenault

Reputation: 45

How to use find_all with BeautifulSoup to search for multiple tags or classes?

I'm scraping some HTML that is formatted like this:

<div class="doccontent">
<h3> Section Title 1 </h3>
<div class="line"> My first line </div>
<div class="line> My second line </div>
<div class="linenumber"> text i don't need </div>

<h3> Section Title 2 </h3>
<div class="line"> My third line </div>
<div class="chapter">Chapter four</div>
<div class="line> My fourth line </div>
</div>

I only want to capture the h3 and class="line" text. I tried two ways. The first:

for lines in full_text:
    for booktitle in lines.find("h3"):
        linesArr.append(booktitle)
    for line in lines.find_all(class_='line'):
        linesArr.append(line)

This appends all booktitles to the beginning of the list, then starts working on the lines.

The second:

for lines in full_text:
    for line in lines.find_all(['h3', class_="line"]):
        linesArr.append(line)

The second seems more promising to me, but there is a syntax error.The BS4 documentation doesn't cover how to search for a list of tags and classes. Any help with be appreciated.

Upvotes: 1

Views: 96

Answers (1)

QHarr
QHarr

Reputation: 84465

As mentioned in comments you can use css Or syntax to specify multiple css selectors and pass those to select

data = [item.text for item in soup.select("h3 , .line")]

Upvotes: 2

Related Questions