Naveen Manoharan
Naveen Manoharan

Reputation: 135

How to extract innerHTML from tag using BeautifulSoup in Python

I am trying to extract the innerHTML from a tag using the following code:

theurl = "http://na.op.gg/summoner/userName=Darshan"
thepage = urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")
rank = soup.findAll('span',{"class":"tierRank"})

However I am getting [< span class="tierRank" > Master < /span >] instead. What I want to show is the value "Master" only.

Using soup.get_text instead of soup.findall doesn't work.

I tried adding .text and .string to the end of last line but that did not work either.

Upvotes: 12

Views: 11142

Answers (3)

Adel
Adel

Reputation: 1468

if you want as a bulk you can use the following

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("C:\\test.html"), "html.parser")

for data1 in soup.find_all('td', {'class' : 'YourClass'}):
    print(data1.decode_contents(), sep="\n")

Upvotes: 2

Raman baral
Raman baral

Reputation: 11

Use .decode_contents() if you want innerHTML (with html tags) use .text if you want innerText (no html tags)

Upvotes: 1

Matt Morgan
Matt Morgan

Reputation: 5313

soup.findAll('span',{"class":"tierRank"}) returns a list of elements that match <span class="tierRank">.

  1. You want the first element from that list.
  2. You want the innerHtml from that element, which can be accessed by the decode_contents() method.

All together:

rank = soup.findAll('span',{"class":"tierRank"})[0].decode_contents()

This will store "Master" in rank.

Upvotes: 17

Related Questions