BitByBit
BitByBit

Reputation: 567

Using BeautifulSoup, how to get text only from the specific selector without the text in the children?

I don't know how to code BeautifulSoup so that it gives me only the text from the selected tag. I get more such as the text of its child(ren)!

For example:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<div id="left"><ul><li>"I want this text"<a href="someurl.com"> I don\'t want this text</a><p>I don\'t want this either</li><li>"Good"<a href="someurl.com"> Not Good</a><p> Not Good either</li></ul></div>', "html5lib") 
x = soup.select('ul > li')
for i in x:
    print(i.text)

Output:

"I want this text" I don't want this textI don't want this either

"Good" Not Good Not Good either

Desired Output:

"I want this text"

"Good"

Upvotes: 3

Views: 3643

Answers (2)

kiviak
kiviak

Reputation: 1103

from bs4 import BeautifulSoup
from bs4 import NavigableString
soup = BeautifulSoup('<div id="left"><ul><li>"I want this text"<a href="someurl.com"> I don\'t want this text</a><p>I don\'t want this either</li><li>"Good"<a href="someurl.com"> Not Good</a><p> Not Good either</li></ul></div>', "html5lib")
x = soup.select('ul > li')
for i in x:
    if isinstance(i.next_element, NavigableString):#if li's next child is a string
        print(i.next_element)

Upvotes: -1

alecxe
alecxe

Reputation: 473863

One option would be to get the first element of the contents list:

for i in x:
    print(i.contents[0])

Another - find the first text node:

for i in x:
    print(i.find(text=True))

Both would print:

"I want this text"
"Good"

Upvotes: 6

Related Questions