Sumax
Sumax

Reputation: 719

Rudimentary Access to webdata (html) through socket (port 80) connection using Python

My question is theoretical: we can use the urllib library (urlopen) to return a html page; I understand that data = mysock.recv(512) behaves as document.read() for the received data(UTF-8 or ASCII).

What code in the below lines operates as open('document') function? open('document') locates the file specified and checks if it exists, my guess is mysock.send(cmd) is the equivalent since it sends the GET request to the webserver to procure the specified file from the address.

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if (len(data) < 1):
        break
    print(data.decode())
mysock.close()

Edit: I've seemed to have found an answer, but a more thorough reasoning is left wanting.

Upvotes: 1

Views: 194

Answers (2)

ralf htp
ralf htp

Reputation: 9422

You are correct that http GET method searches the file on the server. In https://medium.com/from-the-scratch/http-server-what-do-you-need-to-know-to-build-a-simple-http-server-from-scratch-d1ef8945e4fa is an example implementation of GET method in C :

GET /info.html HTTP/1.1

So, we just have to search for the info.html file in current directory (as / specifies that it is looking in the root directory of the server. If it is like /messages/info.html then we have to look inside messages folder for info.html file).

source : https://medium.com/from-the-scratch/http-server-what-do-you-need-to-know-to-build-a-simple-http-server-from-scratch-d1ef8945e4fa

The linux implementation of http protocol is similar ...

So mysock.send(cmd) is similar to open(document) because with it the GET is send that causes the server to search the file / check if it exists

Upvotes: 0

Sumax
Sumax

Reputation: 719

After careful study, the right answer to this is: mysock.connect(('data.pr4e.org', 80)) behaves similarly to open('romeo.txt'); though it enables only connecting to the host domain through the port 80, and open() differs as it locates the existence of 'romeo.txt' in defined location.

Upvotes: 1

Related Questions