Reputation: 11
This is the syntax of the web browser I'm building:
"import socket
mysock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
print('Please Enter the website url and port below, else the default port will be used')
print('\rWebsite:')
website=input()
print('\rPORT:')
port=input()
if len(website)<2:
print("/n You have entered an Invalid Link")
if len(port)<1:
port=80
while len(website)>1:
mysock.connect((website, (int(port))))
cmd=('GET'+website+'HTTP/1.0 \r\n\r\n').encode()
mysock.send(cmd)
while True:
data=mysock.recv(512)
if len(data) < 1:
break
print(data.decode(),end='')
mysock.close()
"
I'm trying to access this link: http://data.pr4e.org/romeo.txt through port 80 but receive
"Traceback (most recent call last):
File "C:\Users\PLAY\Documents\python work\browser.py", line 13, in <module>
mysock.connect((website, (int(port))))
socket.gaierror: [Errno 11001] getaddrinfo failed"
What's going on here and how could I improve it? I'm just messing around with the code abit and would like to understand more and make it more functional so I can build on top of it. Thanks in advance.
Upvotes: 1
Views: 2769
Reputation: 597730
You can't pass a whole URL to socket.connect()
, only an IP/hostname and a port number. So, you would need to parse the user's input and break it up into its constituent components, and then you can act on them.
If the user enters http://data.pr4e.org/romeo.txt
, you need to break that up into http
, data.pr4e.org
, and /romeo.txt
. Connect the socket to data.pr4e.org
on port 80 (because http
is used), and then send a request for GET /romeo.txt HTTP/1.0\r\n\r\n
(note the spaces!).
Python has a urllib.parse
module for parsing URLs.
A better option is to use the urllib.request
module, or even the Requests library, instead of a socket manually. Let the module/library do all of the URL parsing and socket handling for you. You give them a URL to access, they give you back the downloaded data from that location.
Upvotes: 1