Reputation: 393
I am trying to access
https://www.exploit-db.com/remote
Using request module of python, however not getting the response from page. I want to visit all the links from above page.
mfun():
response = requests.get('https://www.exploit-db.com/remote',verify=False)
print(response.text)
soup = bs4.BeautifulSoup(response.text)
return [a.attrs.get('href') for a in soup.select('a[href^=/download/]')]
main():
urls = myfun();
for url in urls:
response = requests.get(url)
print(response.text)
I am getting response:
C:\Python27\requests\packages\urllib3\connectionpool.py:791: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
InsecureRequestWarning)
Upvotes: 1
Views: 624
Reputation: 1121466
The site uses a firewall that looks for 'scripted' access. It can simply be defeated by setting a User-Agent
header; the value Mozilla/5.0
appears to be enough:
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.exploit-db.com/remote', headers=headers, verify=False)
Note that the resulting page has no URLs prefixed by download
; only by https://www.exploit-db.com/download
. Either adjust your ^=
prefix match, or use *=download
instead.
Upvotes: 2