HMage
HMage

Reputation: 1591

Tried and true simple file copying code in C?

This looks like a simple question, but I didn't find anything similar here.

Since there is no file copy function in C, we have to implement file copying ourselves, but I don't like reinventing the wheel even for trivial stuff like that, so I'd like to ask the cloud:

  1. What code would you recommend for file copying using fopen()/fread()/fwrite()?
    • What code would you recommend for file copying using open()/read()/write()?

This code should be portable (windows/mac/linux/bsd/qnx/younameit), stable, time tested, fast, memory efficient and etc. Getting into specific system's internals to squeeze some more performance is welcomed (like getting filesystem cluster size).

This seems like a trivial question but, for example, source code for CP command isn't 10 lines of C code.

Upvotes: 9

Views: 13659

Answers (7)

Alexei A Smekalkine
Alexei A Smekalkine

Reputation: 26

The accepted answer written by Steve Jessop does not answer to the first part of the quession, Jonathan Leffler do it, but do it wrong: code should be written as

while ((n = fread(buffer, 1, sizeof(buffer), f1)) > 0)
    if (fwrite(buffer, n, 1, f2) != 1)
        /* we got write error here */

/* test ferror(f1) for a read errors */

Explanation:

  1. sizeof(char) = 1 by definition, always: it does not matter how many bits in it, 8 (in most cases), 9, 11 or 32 (on some DSP, for example) — size of char is one. Note, it is not an error here, but an extra code.
  2. The fwrite function writes upto nmemb (second argument) elements of specified size (third argument), it does not required to write exactly nmemb elements. To fix this you must write the rest of the data readed or just write one element of size n — let fwrite do all his work. (This item is in question, should fwrite write all data or not, but in my version short writes impossible until error occurs.)
  3. You should test for a read errors too: just test ferror(f1) at the end of loop.

Note, you probably need to disable buffering on both input and output files to prevent triple buffering: first on read to f1 buffer, second in our code, third on write to f2 buffer:

setvbuf(f1, NULL, _IONBF, 0);
setvbuf(f2, NULL, _IONBF, 0);

(Internal buffers should, probably, be of size BUFSIZ.)

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 753735

This is the function I use when I need to copy from one file to another - with test harness:

/*
@(#)File:           $RCSfile: fcopy.c,v $
@(#)Version:        $Revision: 1.11 $
@(#)Last changed:   $Date: 2008/02/11 07:28:06 $
@(#)Purpose:        Copy the rest of file1 to file2
@(#)Author:         J Leffler
@(#)Modified:       1991,1997,2000,2003,2005,2008
*/

/*TABSTOP=4*/

#include "jlss.h"
#include "stderr.h"

#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_fcopy_c[] = "@(#)$Id: fcopy.c,v 1.11 2008/02/11 07:28:06 jleffler Exp $";
#endif /* lint */

void fcopy(FILE *f1, FILE *f2)
{
    char            buffer[BUFSIZ];
    size_t          n;

    while ((n = fread(buffer, sizeof(char), sizeof(buffer), f1)) > 0)
    {
        if (fwrite(buffer, sizeof(char), n, f2) != n)
            err_syserr("write failed\n");
    }
}

#ifdef TEST

int main(int argc, char **argv)
{
    FILE *fp1;
    FILE *fp2;

    err_setarg0(argv[0]);
    if (argc != 3)
        err_usage("from to");
    if ((fp1 = fopen(argv[1], "rb")) == 0)
        err_syserr("cannot open file %s for reading\n", argv[1]);
    if ((fp2 = fopen(argv[2], "wb")) == 0)
        err_syserr("cannot open file %s for writing\n", argv[2]);
    fcopy(fp1, fp2);
    return(0);
}

#endif /* TEST */

Clearly, this version uses file pointers from standard I/O and not file descriptors, but it is reasonably efficient and about as portable as it can be.


Well, except the error function - that's peculiar to me. As long as you handle errors cleanly, you should be OK. The "jlss.h" header declares fcopy(); the "stderr.h" header declares err_syserr() amongst many other similar error reporting functions. A simple version of the function follows - the real one adds the program name and does some other stuff.

#include "stderr.h"
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

void err_syserr(const char *fmt, ...)
{
    int errnum = errno;
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    if (errnum != 0)
        fprintf(stderr, "(%d: %s)\n", errnum, strerror(errnum));
    exit(1);
}

The code above may be treated as having a modern BSD license or GPL v3 at your choice.

Upvotes: 5

Mandrake
Mandrake

Reputation: 361

the size of each read need to be a multiple of 512 ( sector size ) 4096 is a good one

Upvotes: 2

Steve Jessop
Steve Jessop

Reputation: 279255

As far as the actual I/O goes, the code I've written a million times in various guises for copying data from one stream to another goes something like this. It returns 0 on success, or -1 with errno set on error (in which case any number of bytes might have been copied).

Note that for copying regular files, you can skip the EAGAIN stuff, since regular files are always blocking I/O. But inevitably if you write this code, someone will use it on other types of file descriptors, so consider it a freebie.

There's a file-specific optimisation that GNU cp does, which I haven't bothered with here, that for long blocks of 0 bytes instead of writing you just extend the output file by seeking off the end.

void block(int fd, int event) {
    pollfd topoll;
    topoll.fd = fd;
    topoll.events = event;
    poll(&topoll, 1, -1);
    // no need to check errors - if the stream is bust then the
    // next read/write will tell us
}

int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
    for(;;) {
       void *pos;
       // read data to buffer
       ssize_t bytestowrite = read(fdin, buf, bufsize);
       if (bytestowrite == 0) break; // end of input
       if (bytestowrite == -1) {
           if (errno == EINTR) continue; // signal handled
           if (errno == EAGAIN) {
               block(fdin, POLLIN);
               continue;
           }
           return -1; // error
       }

       // write data from buffer
       pos = buf;
       while (bytestowrite > 0) {
           ssize_t bytes_written = write(fdout, pos, bytestowrite);
           if (bytes_written == -1) {
               if (errno == EINTR) continue; // signal handled
               if (errno == EAGAIN) {
                   block(fdout, POLLOUT);
                   continue;
               }
               return -1; // error
           }
           bytestowrite -= bytes_written;
           pos += bytes_written;
       }
    }
    return 0; // success
}

// Default value. I think it will get close to maximum speed on most
// systems, short of using mmap etc. But porters / integrators
// might want to set it smaller, if the system is very memory
// constrained and they don't want this routine to starve
// concurrent ops of memory. And they might want to set it larger
// if I'm completely wrong and larger buffers improve performance.
// It's worth trying several MB at least once, although with huge
// allocations you have to watch for the linux 
// "crash on access instead of returning 0" behaviour for failed malloc.
#ifndef FILECOPY_BUFFER_SIZE
    #define FILECOPY_BUFFER_SIZE (64*1024)
#endif

int copy_data(int fdin, int fdout) {
    // optional exercise for reader: take the file size as a parameter,
    // and don't use a buffer any bigger than that. This prevents 
    // memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
    // is small.
    for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
        void *buffer = malloc(bufsize);
        if (buffer != NULL) {
            int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
            free(buffer);
            return result;
        }
    }
    // could use a stack buffer here instead of failing, if desired.
    // 128 bytes ought to fit on any stack worth having, but again
    // this could be made configurable.
    return -1; // errno is ENOMEM
}

To open the input file:

int fdin = open(infile, O_RDONLY|O_BINARY, 0);
if (fdin == -1) return -1;

Opening the output file is tricksy. As a basis, you want:

int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
if (fdout == -1) {
    close(fdin);
    return -1;
}

But there are confounding factors:

  • you need to special-case when the files are the same, and I can't remember how to do that portably.
  • if the output filename is a directory, you might want to copy the file into the directory.
  • if the output file already exists (open with O_EXCL to determine this and check for EEXIST on error), you might want to do something different, as cp -i does.
  • you might want the permissions of the output file to reflect those of the input file.
  • you might want other platform-specific meta-data to be copied.
  • you may or may not wish to unlink the output file on error.

Obviously the answers to all these questions could be "do the same as cp". In which case the answer to the original question is "ignore everything I or anyone else has said, and use the source of cp".

Btw, getting the filesystem's cluster size is next to useless. You'll almost always see speed increasing with buffer size long after you've passed the size of a disk block.

Upvotes: 3

T.E.D.
T.E.D.

Reputation: 44804

One thing I found when implementing my own file copy, and it seems obvious but it's not: I/O's are slow. You can pretty much time your copy's speed by how many of them you do. So clearly you need to do as few of them as possible.

The best results I found were when I got myself a ginourmous buffer, read the entire source file into it in one I/O, then wrote the entire buffer back out of it in one I/O. If I even had to do it in 10 batches, it got way slow. Trying to read and write out each byte, like a naieve coder might try first, was just painful.

Upvotes: 1

David Cournapeau
David Cournapeau

Reputation: 80710

Depending on what you mean by copying a file, it is certainly far from trivial. If you mean copying the content only, then there is almost nothing to do. But generally, you need to copy the metadata of the file, and that's surely platform dependent. I don't know of any C library which does what you want in a portable manner. Just handling the filename by itself is no trivial matter if you care about portability.

In C++, there is the file library in boost

Upvotes: 1

merkuro
merkuro

Reputation: 6177

Here is a very easy and clear example: Copy a file. Since it is written in ANSI-C without any particular function calls I think this one would be pretty much portable.

Upvotes: 1

Related Questions