Reputation: 835
I'm honestly finding BeautifulSoup
too difficult, the documentation doesn't explain the basics I'm looking for.
I am trying to return the string inside a tag that has an attribute:
<span class="on">6220</span>
But running this:
def fetch_online():
users = page('span', {'class' : 'on'})
return str(users)
Gives me [<span class="on">6220</span>]
. So I figured I'm doing it all wrong, what is the way to just get a simple string out of a tag?
Upvotes: 1
Views: 79
Reputation:
It's true that BeautifulSoup is not so easy to understand but it can be sooo useful sometimes ;)
So, to re-take FlopCoder exemple and explain it a little bit more :
html = # HTML Code #maybe parsed from a website
soup = BeautifulSoup(html) #you create a soup object with your html code
x = soup.find('span', {'class' : 'on'}) #Search for the first span balise in the code, whith class : on
print x.text #Find the found balise, .text mean only the text inside the <>text</>
In the case you have more than one to find you need to do :
x = soup.findAll('span', {'class' : 'on'})
for span in x:
print span.text
This last exemple use findAll. It creates a list with all the span balises with Class:On in the code. So then you can run a for.
your_object.text --> return the text
your_object.a --> return the link (and so on ...)
Hope it can help a little bit !
Upvotes: 1
Reputation: 8218
Replace
return str(users)
with
return users[0].string
or
return users[0].contents
The page('span ...
call is actually shorthand notation for calling the find_all()
function, which returns a list. So you first index into that list, get the tag, then get its contents
. Running the Python str()
function on it is going to give you the whole thing - you want the BeautifulSoup function for getting the string of a tag.
Upvotes: 0
Reputation: 13356
You can do it like this:
html = # your HTML source goes here
soup = BeautifulSoup(html)
x = soup.find('span', {'class' : 'on'})
print x.text
print x.string
print x.contents[0]
Upvotes: 1