pawelty
pawelty

Reputation: 1000

scraping with requests instead of selenium

I have already managed to scrape this website with Selenium however, due to low speed, I'd like to extract data with POST request but I am not even sure if it's possible.

I am trying with default search parameters and name 'Amit Kumar'. I thought that I'd achieve that with the following code:

start_url = "http://bombayhighcourt.nic.in/party_query.php"
raw_data = "m_hc=01&m_side=C&pageno=1&m_party=Amit+Kumar&petres=P&myr=2017&submit1=Submit"
json={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}

requests.post(start_url, json=json, data=raw_data).text

However what I get is just an empty template without data. I have also tried passing raw data as dictionary but didn't have any success yet. Do I really need selenium for that type of project?

Upvotes: 0

Views: 1005

Answers (1)

zwer
zwer

Reputation: 25799

You need to set your headers (like User-Agent) as headers argument, and if you're posting a hand-made query then you should append it to your URL, so:

start_url = "http://bombayhighcourt.nic.in/party_query.php"
raw_data = "m_hc=01&m_side=C&pageno=1&m_party=Amit+Kumar&petres=P&myr=2017&submit1=Submit"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}

res = requests.get("{}?{}".format(start_url, raw_data), headers=headers)
print(res.text)  # or do whatever you want with the response

And if you want, requests can build your query string for you:

start_url = "http://bombayhighcourt.nic.in/party_query.php"
raw_data = {"m_hc": "01",
            "m_side": "C",
            "pageno": 1,
            "m_party": "Amit+Kumar",
            "petres": "P",
            "myr": 2017,
            "submit1": "Submit"}
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}

res = requests.get(start_url, params=raw_data, headers=headers)
print(res.text)  # or do whatever you want with the response

UPDATE - I've checked the source of your page and you're submitting to the wrong URL - if you open it, you're served with a form that points to http://bombayhighcourt.nic.in/partyquery_action.php instead, and you're supposed to post data to it matching the fields of the said form. So to get your desired response you can use:

start_url = "http://bombayhighcourt.nic.in/partyquery_action.php"
raw_data = {"m_hc": "01",  # 01: Bombay; 02: Aurangabad; 03: Nagpur
            "m_side": "C",  # C: Civil; CR: Criminal; OS: Original
            "pageno": 1,  # page number
            "m_party": "Amit Kumar",  # search query
            "petres": "P",  # P: Petitioner; R: Respondent
            "myr": 2017}  # valid range 1965-2017
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}

res = requests.post(start_url, data=raw_data, headers=headers)
print(res.text)  # or do whatever you want with the response

Upvotes: 3

Related Questions