Extract text from html file with BeautifulSoup/Python

Question

I am trying to extract the text from a html file. The html file looks like this:


    1
        Baden-Württemberg
    


    
        2
        Bayern
    


    
        3
        Berlin

I want to extract the last text from the last spantag. In the first line it would be "Baden-Würtemberg" after class="toctext"and then put it to a python list.

in Python I tried the following:

names = soup.find_all("span",{"class":"toctext"})

My output the is this list:

[Baden-Württemberg, Bayern, Berlin]

So how can I extract only the text between the tags?

Thanks to all

warnerm06 · Accepted Answer

The find_all method returns a list. Iterate over the list to get the text.

for name in names:
    print(name.text)

Returns:

Baden-Württemberg
Bayern
Berlin

The builtin python dir() and type() methods are always handy to inspect an object.

print(dir(names))

[...,
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort',
 'source']

Extract text from html file with BeautifulSoup/Python

Answers (2)

Related Questions