linchenkarenUT
linchenkarenUT

Reputation: 53

How to only select certain p tags without children?

I am new to the beautifulSoup and here is a naive question I have when I want to scrape some information on university course websites. The html is as followed and I'd like to get the text between tags p but not tags p which have some children like <strong> and <em>

The text desired:This course introduces....

Really appreciate your help!

<p>
<strong>MSDS 402 Introduction to Data Science</strong>
</p >
<p>This course introduces.....</p >
<p>
<em>Prerequisites: None.</em>
</p >
<p><a aria-label="MSDS 402-DL Section, ID#: 4765" class="link-list" href=" ">View MSDS 402-DL Sections</a ></p >

Upvotes: 1

Views: 873

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195458

You can use CSS selector p:not(:has(*)) that will select tags <p> without any children tags.

For example:

from bs4 import BeautifulSoup


txt = '''<p>
<strong>MSDS 402 Introduction to Data Science</strong>
</p >
<p>This course introduces.....</p >
<p>
<em>Prerequisites: None.</em>
</p >
<p><a aria-label="MSDS 402-DL Section, ID#: 4765" class="link-list" href=" ">View MSDS 402-DL Sections</a ></p >'''


soup = BeautifulSoup(txt, 'html.parser')

for p in soup.select('p:not(:has(*))'):
    print(p)

Prints:

<p>This course introduces.....</p>

Upvotes: 2

Related Questions