Reputation: 5531
How can I get the first child?
<div class="cities">
<div id="3232"> London </div>
<div id="131"> York </div>
</div>
How can I get London?
for div in nsoup.find_all(class_='cities'):
print (div.children.contents)
AttributeError: 'listiterator' object has no attribute 'contents'
Upvotes: 22
Views: 50487
Reputation: 84465
With modern versions of bs4 (certainly bs4 4.7.1+) you have access to :first-child css pseudo selector. Nice and descriptive. Use soup.select_one
if you only want to return the first match i.e. soup.select_one('.cities div:first-child').text
. It is considered good practice to test is not None
before using .text
accessor.
from bs4 import BeautifulSoup as bs
html = '''
<div class="cities">
<div id="3232"> London </div>
<div id="131"> York </div>
</div>
'''
soup = bs(html, 'lxml') #or 'html.parser'
first_children = [i.text for i in soup.select('.cities div:first-child')]
print(first_children)
Upvotes: 15
Reputation: 59974
The current accepted answer gets all cities, when the question only wanted the first.
If you only need the first child, you can take advantage of .children
returning an iterator and not a list. Remember that an iterator generates list items on the fly, and because we only need the first element of the iterator, we don't ever need to generate all other city elements (thus saving time).
for div in nsoup.find_all(class_='cities'):
first_child = next(div.children, None)
if first_child is not None:
print(first_child.string.strip())
Upvotes: 9
Reputation: 11543
div.children returns an iterator.
for div in nsoup.find_all(class_='cities'):
for childdiv in div.find_all('div'):
print (childdiv.string) #london, york
AttributeError was raised, because of non-tags like '\n'
are in .children
. just use proper child selector to find the specific div.
(more edit) can't reproduce your exceptions - here's what I've done:
In [137]: print foo.prettify()
<div class="cities">
<div id="3232">
London
</div>
<div id="131">
York
</div>
</div>
In [138]: for div in foo.find_all(class_ = 'cities'):
.....: for childdiv in div.find_all('div'):
.....: print childdiv.string
.....:
London
York
In [139]: for div in foo.find_all(class_ = 'cities'):
.....: for childdiv in div.find_all('div'):
.....: print childdiv.string, childdiv['id']
.....:
London 3232
York 131
Upvotes: 14