BeautifulSoup get only the "general" text in a td tag, and nothing in nested tags

Question

Say that my html looks like this:

Potato1 Potato2
...
Potato9 Potato10

I have beautifulsoup doing this:

for tag in soup.find_all("td"):
    print tag.text

And I get

Potato1 Potato2
....
Potato9 Potato10

Would it be possible to just get the text that's inside the tag but not any text nested inside the span tag?

nu11p01n73R · Accepted Answer

You can use .contents as

>>> for tag in soup.find_all("td"):
...     print tag.contents[0]
...
Potato1
Potato9

What it does?

A tags children are available as a list using the .contents.

>>> for tag in soup.find_all("td"):
...     print tag.contents
...
[u'Potato1 ', Potato2]
[u'Potato9 ', Potato10]

since we are only interested in the first element, we go for

print tag.contents[0]

Answers (2)