Extra newline character for children of Beautiful Soup

Question

I use BeautifulSoup on a snippet of html as follows:

 s = """
             
                Love Heals
                 
            
             
                Friday, March 20, 2015
                 
            
""" 

soup = BeautifulSoup(s)

Why does s.span only return the first span tag?

Moreover s.contents returns a list of length 4. Both span tags are in the list but the 0th and 2nd index are " $ new line characters. The new line character is useless. Is there a reason why this is done?

alecxe · Accepted Answer

Why does s.span only return the first span tag?

s.span is a shortcut to s.find('span') which would find the first occurrence of the span tag only.

Moreover s.contents returns a list of length 4. Both span tags are in the list but the 0th and 2nd index are " $ new line characters. The new line character is useless. Is there a reason why this is done?

By definition, .contents outputs a list of all element's children, including text nodes - instances of NavigableString class.

If you want the tags only, you can use find_all():

soup.find_all()

And, if only span tags:

soup.find_all('span')

Example:

>>> from bs4 import BeautifulSoup
>>> s = """
...              
...                 Love Heals
...                  
...             
...              
...                 Friday, March 20, 2015
...                  
...             
... """ 
>>> soup = BeautifulSoup(s)
>>> for span in soup.find_all('span'):
...     print span.text.strip()
... 
Love Heals
Love Heals
Friday, March 20, 2015
Friday, March 20, 2015

The reason for the duplicates is that there are nested span elements. You can fix it in different ways. For example, you can make the search inside the div only with recursive=False:

>>> for span in soup.find('div', class_='views-row-1').find_all('span', recursive=False):
...     print span.text.strip()
... 
Love Heals
Friday, March 20, 2015

Or, you can make use of CSS Selectors:

>>> for span in soup.select('div.views-row-1 > span'):
...     print span.text.strip()
... 
Love Heals
Friday, March 20, 2015

Extra newline character for children of Beautiful Soup

Answers (1)

Related Questions