Reputation: 291
How are files downloaded from servers in programming languages like C? I understand higher level languages have magic functions like "download_file_from_url()" but they don't help me understand what is actually going on. I'm a little familiar with sockets but network programming in general is still a black box to me. Thanks for any help.
Upvotes: 3
Views: 3840
Reputation: 1073978
Basically, at a low-ish level, the program is opening a socket to port 80 (usually) on the server and sending it a request that looks something like this:
GET /index.html HTTP/1.1
Host: stackoverflow.com
...followed by a blank line.
The server then responds with the data, which typically consists of a few header lines, a blank line, and the requested resource. With HTTP 1.1 the default is to keep the connection alive for subsequent requests (although the server could terminate it if it liked); if I'd used HTTP 1.0 or added a Connection: close
header, the server would break the connnection after sending the resource.
Check out the Wikipedia article on HTTP for details, or if you really want to get into it, check out the spec (all-in-one-page here). You can see what this looks like for yourself if you have telnet
(and you probably do). Just type telnet stackoverflow.com 80
and then type in the lines above. Remember to press Enter on the blank line.
You do not want to reinvent this wheel. Virtually all languages and environments have a library available to help you that deals with all of the intricacies. (For instance, try the example above with www.stackoverflow.com
instead of stackoverflow.com
in both places — you get back a "moved permanently" response because the SO team want SO to be at stackoverflow.com
, not www.stackoverflow.com
. There are also "moved temporarily" responses, etc., etc.)
Upvotes: 13
Reputation: 9676
If you are downloading a file using HTTP then you should read RFC on HTTP (how data is split by chunks etc.), using FTP — RFC on FTP (which commands are used, e. g. PWD
, CD
etc.). However these are higher-level protocols that utilize sockets anyway.
Upvotes: 1
Reputation: 129363
To download a file (assume a simple case - no firewall etc...), you need to:
Connect to a DNS server to resolve the name of the URL's server into an IP
Open a connection to that IP on the URL's port or default port for your protocol (80 for http)
Send the appropriate HTTP command over to that server
Listen for HTTP response
Process response correctly, and if the response contains the data for the file, keepr eding the reponse and saving the data in temp file
When file is fully downloaded, close the connection and move the complete temp file into proper location.
Upvotes: 1
Reputation: 27900
And a "black box" is probably a good way to keep it :-)
You do the same thing in C that you would do in "higher level languages" - use a library function that does it for you. (The difference is that the library function isn't a standard built-in part of the language).
One choice for C is libcurl
Upvotes: 4