zeerock
zeerock

Reputation: 93

Retrieve all names from html tags using BeautifulSoup

I managed to setup by Beautiful Soup and find the tags that I needed. How do I extract all the names in the tags?

tags = soup.find_all("a")
print(tags)

After running the above code, I got the following output

[<a href="/wiki/Alfred_the_Great" title="Alfred the Great">Alfred the Great</a>, <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">Queen Elizabeth I</a>, <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">Family tree of Scottish monarchs</a>, <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">Kenneth MacAlpin</a>]

How do I retrieve the names, Alfred the Great,Queen Elizabeth I, Kenneth MacAlpin, etc? Do i need to use regular expression? Using .string gave me an error

Upvotes: 0

Views: 77

Answers (2)

Md. Fazlul Hoque
Md. Fazlul Hoque

Reputation: 16187

No need to apply re. You can easily grab all the names by iterating all a tags then call title attribute or get_text() or .find(text=True)

html='''
<html>
 <body>
  <a href="/wiki/Alfred_the_Great" title="Alfred the Great">
   Alfred the Great
  </a>
  ,
  <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">
   Queen Elizabeth I
  </a>
  ,
  <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">
   Family tree of Scottish monarchs
  </a>
  ,
  <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">
   Kenneth MacAlpin
  </a>
 </body>
</html>

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

#print(soup.prettify())

for name in soup.find_all('a'):
    txt = name.get('title')
    #OR
    #txt = name.get_text(strip=True)
    print(txt)

Output:

Alfred the Great
Queen Elizabeth I
Family tree of Scottish monarchs
Kenneth MacAlpin

Upvotes: 0

Vishal
Vishal

Reputation: 2060

You can iterate over the tags and use tag.get('title') to get the title value.

Some other ways to do the same: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes

Upvotes: 1

Related Questions