Reputation: 167
I read that .contents returns the direct children of a tag, and if we want to iterate on those children we should use .children. But I've tried both of them and got the same output.
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p></body></html>
"""
soup = BeautifulSoup(html_doc, "html.parser")
title_tag = soup.title
for child in title_tag.children:
print(child)
for child in title_tag.contents:
print(child)
Upvotes: 3
Views: 3694
Reputation: 542
Considering that you're talking about BeautifulSoup (you should give us some background content!)...
As said here, the main difference is that with .contents
you'll get a list, while with .children
you'll get a generator.
It may not seems to have any difference, as you can iterate both of them, but when you're working with a big set of data, you should always prefer to work with a generator to spare your computer's memory.
Picture this: you have a 10k text file, and you need to work each line at time. When working with a list (for example: with open('t.txt') as f: lines = f.readlines()
), you will fill a whole bunch of your memory with something that you'll not work right away, just hanging there spending space (not to mention that depending your environment, you might not have memory enough...) while working with generators, you'll get a line at time, as desired, but without the memory consumption...
Upvotes: 2
Reputation: 77367
The documentation is a bit more subtle than that. It says
Instead of getting them as a list, you can iterate over a tag’s children using the .children generator
But you can iterate over lists directly in a for loop and you can get an iterator by calling iter()
, so it seems kindof pointless to even have a .children
property. Looking more closely, here's how children
is implemented.
#Generator methods
@property
def children(self):
# return iter() to make the purpose of the method clear
return iter(self.contents) # XXX This seems to be untested.
Yep, it is entirely pointless. Those two fragments of code are identical except that for child in title_tag.contents
gets an iterator for the list and for child in title_tag.children
uses the iterator its been handed.
Upvotes: 5