Reputation: 217
I know there are multiple question for url checks. I am very new to python so trying to understand from multiple posts and searching for new library for help as well. I am trying to work for below point for internal as well as external websites. :
Status Code
Status Description
Response Length
Time Taken
Websites are like ,, www.xyz.com , www.abc.log , www.abc.com/xxx/login.html and more combinations. Below is the
initial code ..
import socket
from urllib2 import urlopen, URLError, HTTPError
import urllib
socket.setdefaulttimeout( 23 ) # timeout in seconds
#print "---------URL----------", " ---Status Code---"
url='https://www.google.com'
try :
response = urlopen( url )
except HTTPError, e:
print 'The server couldn\'t fulfill the request. Reason:', str(e.code)
#Want to get code for that but its not showing
except URLError, e:
print 'We failed to reach a server. Reason:', str(e.reasonse)
#Want to get code for that but its not showing
else :
code=urllib.urlopen(url).getcode()
**#here getcode is working
print url,"-------->", code
#print 'got response!'
I want to check if website exists or not first . Then will go for rest of checks as above mentioned. How to organise this to work for all the above points for 500+ urls. Do I need to import from txt file ? Also one more point I have seen that if www.xyx.com is working and www.xyz.com/lmn.html do not exists, it is still showing 200 .
Upvotes: 0
Views: 1531
Reputation: 92
I think you can the page presence with this code:
import httplib
from urlparse import urlparse
def chkUrl(url):
p = urlparse(url)
conn = httplib.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
if __name__ == '__main__':
print chkUrl('http://www.stackoverflow.com') # True
print chkUrl('http://stackoverflow.com/notarealpage.html') # False
Upvotes: 1