Seth
Seth

Reputation: 8373

Extracting value in Beautifulsoup

I have the following code:

f = open(path, 'r')
html = f.read() # no parameters => reads to eof and returns string

soup = BeautifulSoup(html)
schoolname = soup.findAll(attrs={'id':'ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel'})
print schoolname

which gives:

[<span id="ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel">A B Paterson College, Arundel, QLD</span>]

when I try and access the value (i.e. 'A B Paterson College, Arundel, QLD) by using schoolname['value'] I get the following error:

print schoolname['value'] TypeError: list indices must be integers, not str

What am I doing wrong to get that value?

Upvotes: 1

Views: 2606

Answers (2)

kabp
kabp

Reputation: 73

findAll returns a list of strings, which is why you get an exception. I'm pretty sure your problem is solved simply by using find instead of findAll. Then you should be able to access the value you want with:

schoolname['value']

Obviously this only 'works' if you only need one specific value.

Upvotes: 1

Mark Byers
Mark Byers

Reputation: 838116

You can use contents to move down the tree:

>>> for x in schoolname:
>>>    print x.contents
[u'A B Paterson College, Arundel, QLD']    

Note that the contents doesn't necessarily have to be a string - in general it could also be more tags or a mixture of string and tags.

Upvotes: 2

Related Questions