Reputation: 39
import requests
from bs4 import BeautifulSoup
URL = 'https://www.mohfw.gov.in/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find('table')
table_body = table.find_all('tbody')
print(table_body)
I want the tbody which is out of the comment. Every layer of tr and td have a span section and there are many layers of these.
Upvotes: 0
Views: 208
Reputation: 22440
Some content of tbody that you wish to grab from that page generate dynamically but you can find a link having json content if you look for it in dev tools. The data should all be there now
Try this:
import requests
URL = 'https://www.mohfw.gov.in/data/datanew.json'
page = requests.get(URL,headers={"x-requested-with":"XMLHttpRequest"})
for item in page.json():
sno = item['sno']
state_name = item['state_name']
active = item['active']
positive = item['positive']
cured = item['cured']
death = item['death']
new_active = item['new_active']
new_positive = item['new_positive']
new_cured = item['new_cured']
new_death = item['new_death']
state_code = item['state_code']
print(sno,state_name,active,positive,cured,death,new_active,new_positive,new_cured,new_death,state_code)
Output are like:
2 Andaman and Nicobar Islands 677 2945 2231 37 635 2985 2309 41 35
1 Andhra Pradesh 89932 371639 278247 3460 92208 382469 286720 3541 28
3 Arunachal Pradesh 899 3412 2508 5 987 3555 2563 5 12
4 Assam 19518 94592 74814 260 19535 96771 76962 274 18
5 Bihar 19716 124536 104301 519 19823 126714 106361 530 10
6 Chandigarh 1456 3209 1713 40 1539 3376 1796 41 04
Upvotes: 1