Reputation: 3405

C File Input/Output for Unknown File Types: File Copying

having some issues with a networking assignment. End goal is to have a C program that grabs a file from a given URL via HTTP and writes it to a given filename. I've got it working fine for most text files, but I'm running into some issues, which I suspect all come from the same root cause.

Here's a quick version of the code I'm using to transfer the data from the network file descriptor to the output file descriptor:

unsigned long content_length; // extracted from HTTP header
unsigned long successfully_read = 0;
while(successfully_read != content_length)
{
  char buffer[2048];
  int extracted = read(connection,buffer,2048);
  fprintf(output_file,buffer);
  successfully_read += extracted;
}

As I said, this works fine for most text files (though the % symbol confuses fprintf, so it would be nice to have a way to deal with that). The problem is that it just hangs forever when I try to get non-text files (a .png is the basic test file I'm working with, but the program needs to be able to handle anything).

I've done some debugging and I know I'm not going over content_length, getting errors during read, or hitting some network bottleneck. I looked around online but all the C file i/o code I can find for binary files seems to be based on the idea that you know how the data inside the file is structured. I don't know how it's structured, and I don't really care; I just want to copy the contents of one file descriptor into another.

Can anyone point me towards some built-in file i/o functions that I can bludgeon into use for that purpose?

Edit: Alternately, is there a standard field in the HTTP header that would tell me how to handle whatever file I'm working with?

Upvotes: 2

Answers (5)

Luiz Menezes

Reputation: 819

You are opening the file as a text file. Doing so means that the program will add \r\n characters at the end of every write() call. Try opening the file as binary, and those errors in size shall go away.

Upvotes: 0

Bernd Jendrissek

Reputation: 1088

I bet your program is hanging because it's expecting X bytes but receiving Y instead, with X < Y (most likely, sans compression - but PNG don't compress well with gzip). You'll get chunks [*] of data, with one of the chunks most likely spanning content_length so your condition while(successfully_read != content_length) is always true.

You could try running your program under strace or whatever its equivalent is for your OS, if you want to see how your program continues trying to read data it will never get (because you've likely made an HTTP/1.1 request that holds the connection open, and you haven't made a second request) or has ended (if the server closes the connection, your (repeated) calls to read(2) will just return 0, which leaves your (still true) loop condition unchanged.

If you are sending your program's output to stdout, you may find that it produces no output - this can happen if the resource you are retrieving contains no newline or other flush-forcing control characters. Other stdio buffering regimes may apply when output goes to a file. (For example, the file will remain empty until the stdio buffers have accumulates at least 4096 bytes.)

[*] Then there's also Transfer-Encoding: chunked, as @roland-illig alludes to, which will ruin the exact equivalence between content_length (presumably derived from the eponymous HTTP header) and the actual number of bytes transferred over the socket.

Upvotes: 0

Roland Illig

Reputation: 41625

In addition to Seth's answer: unless you are using a third-party library for handling all the HTTP stuff, you need to deal with the Transfer-Encoding header and the possible compression, or at least detect them and throw an error if you don't know how to handle that case.

In general, it may (or may not) be a good idea to parse the HTTP response headers, and only if they contain exclusively stuff that you understand should you continue to interpret the data that follows the header.

Upvotes: 0

ose

Reputation: 4075

A couple things with your code:

For fprintf - you are using the data as the second argument, when in fact it should be the format, and the data should be the third argument. This is why you are getting problems with the % character, and why it is struggling when presented with binary data, because it is expecting a format string.

You need to use a different function, such as fwrite, to output the file.

As a side note this is a bit of a security problem - if you fetch a specially crafted file from the server it is possible to expose random areas of your memory.

Upvotes: 1

Seth Carnegie

Reputation: 75130

You are using the wrong tool for the job. fprintf takes a format string and extra arguments, like this:

fprintf(output_file, "hello %s, today is the %d", cstring, dayoftheweek);

If you pass the second argument from an unknown source (like the web, which you are doing) you can accidentally have %s or %d or other format specifiers in the string. Then fprintf will try to read more arguments than it was passed, and cause undefined behaviour.

Use fwrite for this:

fwrite(buffer, 1, extracted, output_file);

Upvotes: 4

C File Input/Output for Unknown File Types: File Copying

Answers (5)

Related Questions