user1417933
user1417933

Reputation: 835

Grabbing a simple string from the first tag

I'm honestly finding BeautifulSoup too difficult, the documentation doesn't explain the basics I'm looking for.

I am trying to return the string inside a tag that has an attribute:

<span class="on">6220</span>

But running this:

def fetch_online():
    users = page('span', {'class' : 'on'})
    return str(users)

Gives me [<span class="on">6220</span>]. So I figured I'm doing it all wrong, what is the way to just get a simple string out of a tag?

Upvotes: 1

Views: 79

Answers (3)

user1422695
user1422695

Reputation:

It's true that BeautifulSoup is not so easy to understand but it can be sooo useful sometimes ;)

So, to re-take FlopCoder exemple and explain it a little bit more :

html = # HTML Code #maybe parsed from a website
soup = BeautifulSoup(html) #you create a soup object with your html code
x = soup.find('span', {'class' : 'on'}) #Search for the first span balise in the code, whith class : on
print x.text #Find the found balise, .text mean only the text inside the <>text</>

In the case you have more than one to find you need to do :

x = soup.findAll('span', {'class' : 'on'})
for span in x:
    print span.text

This last exemple use findAll. It creates a list with all the span balises with Class:On in the code. So then you can run a for.

your_object.text --> return the text

your_object.a --> return the link (and so on ...)

Hope it can help a little bit !

Upvotes: 1

Ansari
Ansari

Reputation: 8218

Replace

return str(users)

with

return users[0].string

or

return users[0].contents

The page('span ... call is actually shorthand notation for calling the find_all() function, which returns a list. So you first index into that list, get the tag, then get its contents. Running the Python str() function on it is going to give you the whole thing - you want the BeautifulSoup function for getting the string of a tag.

Upvotes: 0

Sufian Latif
Sufian Latif

Reputation: 13356

You can do it like this:

html = # your HTML source goes here
soup = BeautifulSoup(html)
x = soup.find('span', {'class' : 'on'})
print x.text
print x.string
print x.contents[0]

Upvotes: 1

Related Questions