Rajendra
Rajendra

Reputation: 393

Not receiving response from HTTPS site using request module

I am trying to access

https://www.exploit-db.com/remote

Using request module of python, however not getting the response from page. I want to visit all the links from above page.

mfun():
    response = requests.get('https://www.exploit-db.com/remote',verify=False)
    print(response.text)
    soup = bs4.BeautifulSoup(response.text)
    return [a.attrs.get('href') for a in soup.select('a[href^=/download/]')]

main():
    urls = myfun();
    for url in urls:
      response = requests.get(url)
      print(response.text)

I am getting response:

C:\Python27\requests\packages\urllib3\connectionpool.py:791: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)

Upvotes: 1

Views: 624

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121466

The site uses a firewall that looks for 'scripted' access. It can simply be defeated by setting a User-Agent header; the value Mozilla/5.0 appears to be enough:

headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.exploit-db.com/remote', headers=headers, verify=False)

Note that the resulting page has no URLs prefixed by download; only by https://www.exploit-db.com/download. Either adjust your ^= prefix match, or use *=download instead.

Upvotes: 2

Related Questions