Reputation: 1000
I have already managed to scrape this website with Selenium however, due to low speed, I'd like to extract data with POST request but I am not even sure if it's possible.
I am trying with default search parameters and name 'Amit Kumar'. I thought that I'd achieve that with the following code:
start_url = "http://bombayhighcourt.nic.in/party_query.php"
raw_data = "m_hc=01&m_side=C&pageno=1&m_party=Amit+Kumar&petres=P&myr=2017&submit1=Submit"
json={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}
requests.post(start_url, json=json, data=raw_data).text
However what I get is just an empty template without data. I have also tried passing raw data as dictionary but didn't have any success yet. Do I really need selenium for that type of project?
Upvotes: 0
Views: 1005
Reputation: 25799
You need to set your headers (like User-Agent
) as headers
argument, and if you're posting a hand-made query then you should append it to your URL, so:
start_url = "http://bombayhighcourt.nic.in/party_query.php"
raw_data = "m_hc=01&m_side=C&pageno=1&m_party=Amit+Kumar&petres=P&myr=2017&submit1=Submit"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}
res = requests.get("{}?{}".format(start_url, raw_data), headers=headers)
print(res.text) # or do whatever you want with the response
And if you want, requests
can build your query string for you:
start_url = "http://bombayhighcourt.nic.in/party_query.php"
raw_data = {"m_hc": "01",
"m_side": "C",
"pageno": 1,
"m_party": "Amit+Kumar",
"petres": "P",
"myr": 2017,
"submit1": "Submit"}
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}
res = requests.get(start_url, params=raw_data, headers=headers)
print(res.text) # or do whatever you want with the response
UPDATE - I've checked the source of your page and you're submitting to the wrong URL - if you open it, you're served with a form that points to http://bombayhighcourt.nic.in/partyquery_action.php
instead, and you're supposed to post data to it matching the fields of the said form. So to get your desired response you can use:
start_url = "http://bombayhighcourt.nic.in/partyquery_action.php"
raw_data = {"m_hc": "01", # 01: Bombay; 02: Aurangabad; 03: Nagpur
"m_side": "C", # C: Civil; CR: Criminal; OS: Original
"pageno": 1, # page number
"m_party": "Amit Kumar", # search query
"petres": "P", # P: Petitioner; R: Respondent
"myr": 2017} # valid range 1965-2017
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}
res = requests.post(start_url, data=raw_data, headers=headers)
print(res.text) # or do whatever you want with the response
Upvotes: 3