user3562812
user3562812

Reputation: 1829

Extracting text node inside a tag that has a child element in beautifulsoup4

The HTML that I am parsing and scraping has the following code:

<li> <span> 929</span> Serve Returned </li>

How can I extract just the text node of <li>, "serve returned" in this case with Beautifulsoup?

.string doesn't work since <li> has a child element, and .text returns the text inside <span>.

Upvotes: 6

Views: 1195

Answers (2)

Totem
Totem

Reputation: 7349

I used the str.replace method for this:

>>> li = soup.find('li') # or however you need to drill down to the <li> tag 
>>> mytext = li.text.replace(li.find('span').text, "") 
>>> print mytext
Serve Returned

Upvotes: 2

Hooked
Hooked

Reputation: 88148

import bs4
html = r"<li> <span> 929</span> Serve Returned </li>"
soup = bs4.BeautifulSoup(html)
print soup.li.findAll(text=True, recursive=False)

This gives:

[u' ', u' Serve Returned ']

The first element is the "text" you have before the span. This method could help you find text before and after (and in-between) any child elements.

Upvotes: 4

Related Questions