Reputation: 43
I am trying to extract the text from a html file.
The html
file looks like this:
<li class="toclevel-1 tocsection-1">
<a href="#Baden-Württemberg"><span class="tocnumber">1</span>
<span class="toctext">Baden-Württemberg</span>
</a>
</li>
<li class="toclevel-1 tocsection-2">
<a href="#Bayern">
<span class="tocnumber">2</span>
<span class="toctext">Bayern</span>
</a>
</li>
<li class="toclevel-1 tocsection-3">
<a href="#Berlin">
<span class="tocnumber">3</span>
<span class="toctext">Berlin</span>
</a>
</li>
I want to extract the last text from the last span
tag.
In the first line it would be "Baden-Würtemberg" after class="toctext"
and then put it to a python list.
in Python I tried the following:
names = soup.find_all("span",{"class":"toctext"})
My output the is this list
:
[<span class="toctext">Baden-Württemberg</span>, <span class="toctext">Bayern</span>, <span class="toctext">Berlin</span>]
So how can I extract only the text between the tags?
Thanks to all
Upvotes: 1
Views: 219
Reputation: 4482
With a list of comprehension you could do the following :
names = soup.find_all("span",{"class":"toctext"})
print([x.text for x in names])
Upvotes: 0
Reputation: 814
The find_all
method returns a list. Iterate over the list to get the text.
for name in names:
print(name.text)
Returns:
Baden-Württemberg
Bayern
Berlin
The builtin python dir()
and type()
methods are always handy to inspect an object.
print(dir(names))
[...,
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'append',
'clear',
'copy',
'count',
'extend',
'index',
'insert',
'pop',
'remove',
'reverse',
'sort',
'source']
Upvotes: 2