Reputation: 9492
This is the portion of html code that express the information that I want to extract from a webpage. My intention is to extract just the names and values between the b tags. The result I expect is a list something like this: [On,DVI,396,2035,2551]
...
<div class="txt"><br>
Power: <b>On</b><br><br>
Source: <b>DVI</b><br><br>
Lamp runtime: <b>396</b> hours<br>
Lamp remaining: <b>2035</b> hours<br>
Total operation: <b>2551</b> hours<br>
</div>
...
What I tried was:
from bs4 import BeautifulSoup
import urllib2
url='ip address here'
html=urllib2.urlopen(url).read()
soup=BeautifulSoup(html)
main_div=soup.find("div",{"class":"txt"})
data=main_div.findAll('b').text
What did go wrong? FYI, I am a beginner so please bear with me.
Upvotes: 2
Views: 11431
Reputation: 391
Maybe something like this?
import BeautifulSoup
html = '''<div class="txt"><br>
\nPower: <b>On</b><br><br>
\nSource: <b>DVI</b><br><br>
\nLamp runtime: <b>396</b> hours<br>
\nLamp remaining: <b>2035</b> hours<br>
\nTotal operation: <b>2551</b> hours<br>
\n</div>'''
soup = BeautifulSoup.BeautifulSoup(html)
bTags = []
for i in soup.findAll('b'):
bTags.append(i.text)
Contents of bTags:
[u'On', u'DVI', u'396', u'2035', u'2551']
Upvotes: 2