Reputation: 977
# -*- coding: UTF-8 -*-
import urllib.request
import re
import os
os.system("cls")
url=input("Url Link : ")
if(url[0:8]=="https://"):
url=url[:4]+url[5:]
if(url[0:7]!="http://"):
url="http://"+url
try :
try :
value=urllib.request.urlopen(url,timeout=60).read().decode('cp949')
except UnicodeDecodeError :
value=urllib.request.urlopen(url,timeout=60).read().decode('UTF8')
par='<title>(.+?)</title>'
result=re.findall(par,value)
print(result)
except ConnectionResetError as e:
print(e)
TimeoutError is disappeared. But ConnectionResetError appear. What is this Error? Is it server problem? So it can't solve with me?
Upvotes: 1
Views: 1432
Reputation: 1112
포기하지 마세요! Don't give up!
Some website require specific HTTP Header, in this case, User-agent
. So you need to set this header in your request.
Change your request like this (17 - 20 line of your code)
# Make request object
request = urllib.request.Request(url, headers={"User-agent": "Python urllib test"})
# Open url using request object
response = urllib.request.urlopen(request, timeout=60)
# read response
data = response.read()
# decode your value
try:
value = data.decode('CP949')
except UnicodeDecodeError:
value = data.decode('UTF-8')
You can change "Python urllib test"
to anything you want. Almost every servers use User-agent
for statistical purposes.
Last, consider using appropritate whitespaces, blank lines, comments to make your code more readable. It will be good for you.
More reading:
urllib.request.Request
section.Upvotes: 1