Reputation: 379
I want to scrape data from the website https://xlnindia.gov.in/frm_G_Cold_S_Query.aspx. I have to select the State as Delhi, District as Adarsh Nagar (4) & click on Search button, and scrape all the information.
So far I tried using the given below code as
import requests
from bs4 import BeautifulSoup
Error was coming as 'HTTPS 443 SSL', which I ressolved using 'verify = False
resp = requests.get('https://xlnindia.gov.in/frm_G_Cold_S_Query.aspx',verify=False)
soup = BeautifulSoup(resp.text,"lxml")
dictinfo = {i['name']:i.get('value','') for i in soup.select('input[name]')}
dictinfo['ddlState']='Delhi'
dictinfo['ddldistrict']='Adarsh Nagar (4)'
dictinfo['__EVENTTARGET']='btnSearch'
dictinfo = {k:(None,str(v)) for k,v in dictinfo.items()}
r=requests.post('https://xlnindia.gov.in/frm_G_Cold_S_Query.aspx',verify=False,files=dictinfo)
r
Error: Response [500]
soup2
Error: Invalid postback or callback argument. Event validation is enabled using <pages enableEventValidation="true"/> in configuration or <%@ Page EnableEventValidation="true" %> in a page. For security purposes, this feature verifies that arguments to postback or callback events originate from the server control that originally rendered them. If the data is valid and expected, use the ClientScriptManager.RegisterForEventValidation method in order to register the postback or callback data for validation.
Can someone please help me to scrape it or get it done.
(I can only use REQUEST & BEAUTIFULSOUP library, no SELENIUM, MECHANIZE,etc. libraries. )
Upvotes: 0
Views: 1233
Reputation: 22440
Try the script below to get the tabular results meant to be populated choosing two dropdown items as you stated above from that webpage. Turn out that you have to make two subsequent post requests to populate the results.
import requests
from bs4 import BeautifulSoup
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url = 'https://xlnindia.gov.in/frm_G_Cold_S_Query.aspx'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0'
resp = s.get(url,verify=False)
soup = BeautifulSoup(resp.text,"lxml")
dictinfo = {i['name']:i.get('value','') for i in soup.select('input[name]')}
dictinfo['ddlState'] = 'DL'
res = s.post(url,data=dictinfo)
soup_obj = BeautifulSoup(res.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup_obj.select('input[name]')}
payload['ddldistrict'] = 'ADN'
r = s.post(url,data=payload)
sauce = BeautifulSoup(r.text,"lxml")
for items in sauce.select("#dgDisplay tr"):
data = [item.get_text(strip=True) for item in items.select("td")]
print(data)
Output you may see in the console like:
['Firm Name', 'City', 'Licences', 'Reg. Pharmacists / Comp. Person']
['A ONE MEDICOS', 'DELHI-251/1, GALI NO.1, KH, NO, 739/251/1, NEAR HIMACHAL BHAWAN,SARAI PIPAL THALA, VILLAGE AZAD PUR,', 'R - 2', 'virender kumar, DPH, [22295-17/10/2013]']
['AAROGYAM', 'DELHI-PVT. SHOP NO. 1, GF, 121,VILLAGE BHAROLA', 'R - 2', 'avinesh bhadoriya, DPH, [27033-]']
['ABCO INDIA', 'DELHI-SHOP NO-452/22,BHUSHAN BHAWAN RING ROAD,FLYOVER AZAD PUR', 'W - 2', 'sanjay dubey , SSC, [C-P-03/01/1997]']
['ADARSH MEDICOS', 'DELHI-NORTHERN SIDE B-107, GALI NO. 1,,MAJLIS PARK, VILLAGE BHAROLA,', 'R - 2', 'dilip kumar, BPH, [28036-11/01/2018]']
Upvotes: 1