Reputation: 8373
I have the following code:
f = open(path, 'r')
html = f.read() # no parameters => reads to eof and returns string
soup = BeautifulSoup(html)
schoolname = soup.findAll(attrs={'id':'ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel'})
print schoolname
which gives:
[<span id="ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel">A B Paterson College, Arundel, QLD</span>]
when I try and access the value (i.e. 'A B Paterson College, Arundel, QLD) by using schoolname['value']
I get the following error:
print schoolname['value'] TypeError: list indices must be integers, not str
What am I doing wrong to get that value?
Upvotes: 1
Views: 2606
Reputation: 73
findAll returns a list of strings, which is why you get an exception. I'm pretty sure your problem is solved simply by using find instead of findAll. Then you should be able to access the value you want with:
schoolname['value']
Obviously this only 'works' if you only need one specific value.
Upvotes: 1
Reputation: 838116
You can use contents
to move down the tree:
>>> for x in schoolname:
>>> print x.contents
[u'A B Paterson College, Arundel, QLD']
Note that the contents doesn't necessarily have to be a string - in general it could also be more tags or a mixture of string and tags.
Upvotes: 2