Hamza
Hamza

Reputation: 167

The difference between .contents and .children

I read that .contents returns the direct children of a tag, and if we want to iterate on those children we should use .children. But I've tried both of them and got the same output.

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p></body></html>
"""
soup = BeautifulSoup(html_doc, "html.parser")
title_tag = soup.title

for child in title_tag.children:
    print(child)
for child in title_tag.contents:
    print(child)

Upvotes: 3

Views: 3694

Answers (2)

Thiago Cardoso
Thiago Cardoso

Reputation: 542

Considering that you're talking about BeautifulSoup (you should give us some background content!)...

As said here, the main difference is that with .contents you'll get a list, while with .children you'll get a generator.

It may not seems to have any difference, as you can iterate both of them, but when you're working with a big set of data, you should always prefer to work with a generator to spare your computer's memory.

Picture this: you have a 10k text file, and you need to work each line at time. When working with a list (for example: with open('t.txt') as f: lines = f.readlines()), you will fill a whole bunch of your memory with something that you'll not work right away, just hanging there spending space (not to mention that depending your environment, you might not have memory enough...) while working with generators, you'll get a line at time, as desired, but without the memory consumption...

Upvotes: 2

tdelaney
tdelaney

Reputation: 77367

The documentation is a bit more subtle than that. It says

Instead of getting them as a list, you can iterate over a tag’s children using the .children generator

But you can iterate over lists directly in a for loop and you can get an iterator by calling iter(), so it seems kindof pointless to even have a .children property. Looking more closely, here's how children is implemented.

#Generator methods
@property
def children(self):
    # return iter() to make the purpose of the method clear
    return iter(self.contents)  # XXX This seems to be untested.

Yep, it is entirely pointless. Those two fragments of code are identical except that for child in title_tag.contents gets an iterator for the list and for child in title_tag.children uses the iterator its been handed.

Upvotes: 5

Related Questions