Reputation: 1829
The HTML that I am parsing and scraping has the following code:
<li> <span> 929</span> Serve Returned </li>
How can I extract just the text node of <li>
, "serve returned" in this case with Beautifulsoup
?
.string
doesn't work since <li>
has a child element, and .text
returns the text inside <span>
.
Upvotes: 6
Views: 1195
Reputation: 7349
I used the str.replace
method for this:
>>> li = soup.find('li') # or however you need to drill down to the <li> tag
>>> mytext = li.text.replace(li.find('span').text, "")
>>> print mytext
Serve Returned
Upvotes: 2
Reputation: 88148
import bs4
html = r"<li> <span> 929</span> Serve Returned </li>"
soup = bs4.BeautifulSoup(html)
print soup.li.findAll(text=True, recursive=False)
This gives:
[u' ', u' Serve Returned ']
The first element is the "text" you have before the span. This method could help you find text before and after (and in-between) any child elements.
Upvotes: 4