Reputation: 2596
I have a web document which looks like this :-
<table class="table "><col width="75px"></col><col width="1px"></col><tbody><tr class="tablerow style2" prodid="143012"><td class="pricecell"><span class="WebRupee">Rs.</span>
29
<br/><font style="font-size:smaller;font-weight:normal">
3 days
</font></td><td class="spacer"></td><td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span>
</span><span><span class="label label-default" style="background-color:#fff;color:#0c7abc;border:1px solid #0c7abc">SMS</span>
</span><div style="padding-top:5px">
29
Full Talktime
</div><div class="detailtext"> 5 Local A2A SMS valid for 1 day </div></td></tr><tr class="tablerow style2" prodid="127535"><td class="pricecell"><span class="WebRupee">Rs.</span>
59
<br/><font style="font-size:smaller;font-weight:normal">
7 days
</font></td><td class="spacer"></td><td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span>
</span><span><span class="label label-default" style="background-color:#fff;color:#0c7abc;border:1px solid #0c7abc">SMS</span>
</span><div style="padding-top:5px">
59
Full Talktime
</div><div class="detailtext"> 10 A2A SMS valid for 2 days </div></td></tr><tr class="tablerow style2" prodid="143025"><td class="pricecell"><span class="WebRupee">Rs.</span>
99
<br/><font style="font-size:smaller;font-weight:normal">
12 days
</font></td><td class="spacer"></td><td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span>
</span><div style="padding-top:5px">
99
Full Talktime
</div><div class="detailtext"> 10 Local A2A SMS for 2 days only </div>
I want the values 29, 3 days,29 full talktime, 59, 7 days,59 full talktime etc.
But i get the whole document if I try the below script.
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.ireff.in/plans/airtel/karnataka")
data = r.text
soup = BeautifulSoup(data,"html.parser")
table = soup.find('table',{'class':'table'})
print(table)
Where am I going wrong ? I want to get those values specifically.
OR if the table can be converted to a json array, that also will be helpful.
Upvotes: 3
Views: 6447
Reputation: 8721
You need to dig deeper to get the specific data you're after. For example, to get the prices, search for the table cells with class "pricecell". Then you can get the contained text and just parse that. Some sample code (not tested):
price_cells = soup.findAll('td', {'class': 'pricecell'})
for price_cell in price_cells:
print(price_cell.text)
Upvotes: 2