Reputation: 4984
I use BeautifulSoup
on a snippet of html as follows:
s = """<div class="views-row views-row-1 views-row-odd views-row- first">
<span class="views-field views-field-title">
<span class="field-content"><a href="/party-pictures/2015/love-heals">Love Heals</a>
</span>
</span>
<span class="views-field views-field-created">
<span class="field-content">Friday, March 20, 2015
</span>
</span>
</div>"""
soup = BeautifulSoup(s)
Why does s.span
only return the first span tag?
Moreover s.contents returns a list of length 4. Both span tags are in the list but the 0th and 2nd index are "\n$ new line characters. The new line character is useless. Is there a reason why this is done?
Upvotes: 2
Views: 2125
Reputation: 473873
Why does s.span only return the first span tag?
s.span
is a shortcut to s.find('span')
which would find the first occurrence of the span
tag only.
Moreover s.contents returns a list of length 4. Both span tags are in the list but the 0th and 2nd index are "\n$ new line characters. The new line character is useless. Is there a reason why this is done?
By definition, .contents
outputs a list of all element's children, including text nodes - instances of NavigableString
class.
If you want the tags only, you can use find_all()
:
soup.find_all()
And, if only span
tags:
soup.find_all('span')
Example:
>>> from bs4 import BeautifulSoup
>>> s = """<div class="views-row views-row-1 views-row-odd views-row- first">
... <span class="views-field views-field-title">
... <span class="field-content"><a href="/party-pictures/2015/love-heals">Love Heals</a>
... </span>
... </span>
... <span class="views-field views-field-created">
... <span class="field-content">Friday, March 20, 2015
... </span>
... </span>
... </div>"""
>>> soup = BeautifulSoup(s)
>>> for span in soup.find_all('span'):
... print span.text.strip()
...
Love Heals
Love Heals
Friday, March 20, 2015
Friday, March 20, 2015
The reason for the duplicates is that there are nested span
elements. You can fix it in different ways. For example, you can make the search inside the div
only with recursive=False
:
>>> for span in soup.find('div', class_='views-row-1').find_all('span', recursive=False):
... print span.text.strip()
...
Love Heals
Friday, March 20, 2015
Or, you can make use of CSS Selectors
:
>>> for span in soup.select('div.views-row-1 > span'):
... print span.text.strip()
...
Love Heals
Friday, March 20, 2015
Upvotes: 3