BeautifulSoup .children or .content without whitespace between tags

Question

I want all children of a tag without the whitespace between the tags. But BeautifulSoups .contents and .children also returns the whitespace between the tags.

from bs4 import BeautifulSoup
html = """

  1
  2
  3

"""
soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='list').contents)

This prints:

['
', 1, '
', 2, '
', 3, '
']

Same with

print(list(soup.find(id='list').children))

What I want:

[1, 2, 3]

Is there any way to tell BeautifulSoup to return only the tags and ignore the whitespace?

The documentation is not very helpful on this topic. The html in the example does not contain any whitespace between tags.

Indeed stripping the html of all whitespace between tags solves my problem:

html = """123"""

Using this html I get the tags without whitespace between the tags because there is no whitespace between the tags. But I hoped to use BeautifoulSoup so I do not have to mess around in the html source code. I was hoping BeautifulSoup does that for me.

Another workaround might be:

print(list(filter(lambda t: t != '
', soup.find(id='list').contents)))

But that seems flaky. Is the whitespace guaranteed to be always exactly ' '?

A note to the duplicate marking brigade:

There are many questions asking about BeautifulSoup and whitespace. Most are asking about getting rid of whitespace from the "rendered text".

For example:

BeautifulSoup - getting rid of paragraph whitespace/line breaks

Removing new line ' ' from the output of python BeautifulSoup

Both questions want the text without whitespace. I want the tags without whitespace. The solutions there don't apply to my question.

Another example:

Regular expression for class with whitespaces using Beautifulsoup

This question is about whitespace in the class attribute.

BeautifulSoup .children or .content without whitespace between tags

Answers (1)

Related Questions