user404
user404

Reputation: 2028

How to get value inside span tag using beautiful soup 3

Data format:

<tr><td>Modu</td><td><span class="comments">90</span></td></tr> 
<tr><td>Kenzie</td><td><span class="comments">88</span></td></tr>

I want to get only 90, then 88 and so on. How I tried:

#2.7 version python
#link I used as input: http://python-data.dr-chuck.net/comments_283660.html
import urllib
from BeautifulSoup import *
url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
r=0;
t=0
tags = soup('span')
for tag in tags:
    #print tag.get('class', None)
    #print tag.get('class="comments">', None)
    print 'Contents:',tag.contents

the output is:

Contents: [u'100']
Contents: [u'100']
Contents: [u'97']
Contents: [u'95']
....

How to avoid "u" and only get 100,100,97,95...

Upvotes: 1

Views: 898

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180512

You can index the contents list print 'Contents:',tag.contents[0] or better just to pull the text from the td:

tags = soup('span')
for tag in tags:
    print('Contents:',tag.text)

Which using your link would give you:

('Contents:', u'100')
('Contents:', u'100')
('Contents:', u'97')
('Contents:', u'95')
('Contents:', u'95')
('Contents:', u'94')
('Contents:', u'93')
('Contents:', u'92')
('Contents:', u'84')
('Contents:', u'78')
('Contents:', u'78')
('Contents:', u'76')
('Contents:', u'69')
('Contents:', u'64')
('Contents:', u'60')
('Contents:', u'58')
('Contents:', u'53')
('Contents:', u'51')
('Contents:', u'49')
('Contents:', u'49')
('Contents:', u'45')
('Contents:', u'45')
('Contents:', u'45')
('Contents:', u'44')
('Contents:', u'39')
('Contents:', u'38')
('Contents:', u'37')
('Contents:', u'35')
('Contents:', u'34')
('Contents:', u'33')
('Contents:', u'32')
('Contents:', u'32')
('Contents:', u'30')
('Contents:', u'29')
('Contents:', u'28')
('Contents:', u'27')
('Contents:', u'21')
('Contents:', u'19')
('Contents:', u'16')
('Contents:', u'16')
('Contents:', u'15')
('Contents:', u'13')
('Contents:', u'13')
('Contents:', u'12')
('Contents:', u'11')
('Contents:', u'9')
('Contents:', u'6')
('Contents:', u'2')
('Contents:', u'1')
('Contents:', u'1')

The u just means you have unicode strings, you can call str(tag.text)) if you really want to remove it or if you want integers you will have to call int(tag.text)). Also I would recommend you upgrade to bs4.

Upvotes: 2

Related Questions