stands2reason
stands2reason

Reputation: 730

Fastest way to copy data from one file to another in C/C++?

In my code, I have a situation where I need to copy data from one file to another. The solution I came up with looks like this:

const int BUF_SIZE = 1024;
char buf[BUF_SIZE];

int left_to_copy = toCopy;
while(left_to_copy > BUF_SIZE)
{
    fread(buf, BUF_SIZE, 1, fin);
    fwrite(buf, BUF_SIZE, 1, fout);
    left_to_copy -= BUF_SIZE;
}

fread(buf, left_to_copy, 1, fin);
fwrite(buf, left_to_copy, 1, fout);

My main thought was that there might be something like memcpy, but for data in files. I just give it two file streams and the total number of bytes. I searched a bit but I couldn't find any such thing.

But if something like that isn't available, what buffer size should I use to make the transfer fastest? Bigger would mean fewer system calls, but I figured it could mess up other buffering or caching on the system. Should I dynamically allocate the buffer so it only takes on pair of read/write calls? Typical transfer sizes in this particular case are from a few KB to a dozen or so MB.

EDIT: For OS specific information, we're using Linux.

EDIT2:

I tried using sendfile, but it didn't work. It seemed to write the right amount of data, but it was garbage.

I replaced my example above with something that looks like this:

fflush(fin);
fflush(fout);
off_t offset = ftello64(fin);
sendfile(fileno(fout), fileno(fin), &offset, toCopy);
fseeko64(fin, offset, SEEK_SET);

I added the flush, offest, and seeking one at a time since it didn't appear to be working.

Upvotes: 3

Views: 12518

Answers (5)

Rémi
Rémi

Reputation: 3744

in Linux copy_file_range might be even better than sendfile:

https://unix.stackexchange.com/questions/771238/linux-syscalls-advantage-of-copy-file-range-over-sendfile

Upvotes: 0

Kijewski
Kijewski

Reputation: 26033

You need to tell us your (desired) OS. The appropriate calls (or rather best fitting calls) will be very system specific.

In Linux/*BSD/Mac you would use sendfile(2), which handles the copying in kernel space.

SYNOPSIS

 #include <sys/sendfile.h>
 ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

DESCRIPTION

sendfile() copies data between one file descriptor and another.  Because this
copying is done within the kernel, sendfile() is more efficient than the
combination of read(2) and write(2), which would require transferring data to
and from user space.

in_fd should be a file descriptor opened for reading and out_fd should be a
descriptor opened for writing.

Further reading:

Server portion of sendfile example:

/*

Server portion of sendfile example.

usage: server [port]

Copyright (C) 2003 Jeff Tranter.


This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

*/


#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/sendfile.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <netinet/in.h>


int main(int argc, char **argv)
{
  int port = 1234;           /* port number to use */
  int sock;                  /* socket desciptor */
  int desc;                  /* file descriptor for socket */
  int fd;                    /* file descriptor for file to send */
  struct sockaddr_in addr;   /* socket parameters for bind */
  struct sockaddr_in addr1;  /* socket parameters for accept */
  int    addrlen;            /* argument to accept */
  struct stat stat_buf;      /* argument to fstat */
  off_t offset = 0;          /* file offset */
  char filename[PATH_MAX];   /* filename to send */
  int rc;                    /* holds return code of system calls */

  /* check command line arguments, handling an optional port number */
  if (argc == 2) {
    port = atoi(argv[1]);
    if (port <= 0) {
      fprintf(stderr, "invalid port: %s\n", argv[1]);
      exit(1);
    }
  } else if (argc != 1) {
    fprintf(stderr, "usage: %s [port]\n", argv[0]);
    exit(1);
  }

  /* create Internet domain socket */
  sock = socket(AF_INET, SOCK_STREAM, 0);
  if (sock == -1) {
    fprintf(stderr, "unable to create socket: %s\n", strerror(errno));
    exit(1);
  }

  /* fill in socket structure */
  memset(&addr, 0, sizeof(addr));
  addr.sin_family = AF_INET;
  addr.sin_addr.s_addr = INADDR_ANY;
  addr.sin_port = htons(port);

  /* bind socket to the port */
  rc =  bind(sock, (struct sockaddr *)&addr, sizeof(addr));
  if (rc == -1) {
    fprintf(stderr, "unable to bind to socket: %s\n", strerror(errno));
    exit(1);
  }

  /* listen for clients on the socket */
  rc = listen(sock, 1);
  if (rc == -1) {
    fprintf(stderr, "listen failed: %s\n", strerror(errno));
    exit(1);
  }

  while (1) {

    /* wait for a client to connect */
    desc = accept(sock, (struct sockaddr *)  &addr1, &addrlen);
    if (desc == -1) {
      fprintf(stderr, "accept failed: %s\n", strerror(errno));
      exit(1);
    }

    /* get the file name from the client */
    rc = recv(desc, filename, sizeof(filename), 0);
    if (rc == -1) {
      fprintf(stderr, "recv failed: %s\n", strerror(errno));
      exit(1);
    }

    /* null terminate and strip any \r and \n from filename */
        filename[rc] = '\0';
    if (filename[strlen(filename)-1] == '\n')
      filename[strlen(filename)-1] = '\0';
    if (filename[strlen(filename)-1] == '\r')
      filename[strlen(filename)-1] = '\0';

    /* exit server if filename is "quit" */
    if (strcmp(filename, "quit") == 0) {
      fprintf(stderr, "quit command received, shutting down server\n");
      break;
    }

    fprintf(stderr, "received request to send file %s\n", filename);

    /* open the file to be sent */
    fd = open(filename, O_RDONLY);
    if (fd == -1) {
      fprintf(stderr, "unable to open '%s': %s\n", filename, strerror(errno));
      exit(1);
    }

    /* get the size of the file to be sent */
    fstat(fd, &stat_buf);

    /* copy file using sendfile */
    offset = 0;
    rc = sendfile (desc, fd, &offset, stat_buf.st_size);
    if (rc == -1) {
      fprintf(stderr, "error from sendfile: %s\n", strerror(errno));
      exit(1);
    }
    if (rc != stat_buf.st_size) {
      fprintf(stderr, "incomplete transfer from sendfile: %d of %d bytes\n",
              rc,
              (int)stat_buf.st_size);
      exit(1);
    }

    /* close descriptor for file that was sent */
    close(fd);

    /* close socket descriptor */
    close(desc);
  }

  /* close socket */
  close(sock);
  return 0;
}

Upvotes: 13

john
john

Reputation: 1393

As far as fast reading is considered i thing you can also opt mapping of files - Memory mapped I/O using mmap (see manual page for mmap). It is considered to be more efficient as compared to conventional I/O especially while dealing with large files.

mmap doesn't actually read the file. It just maps it to address space. That's why it's so fast, there is no disc I/O until you actually access that region of address space.

Or you can see the block size first and according to that u can proceed reading , that is also considered efficient because compiler enhances the optimization in that case.

Upvotes: 0

George D Girton
George D Girton

Reputation: 823

It might be worthwhile considering memory-mapped file I/O for your target operating system. For the file sizes you are talking about, this is a viable way to go, and the OS will optimize better than you can do. If you want to write portable-OS code, this may not be the best approach, though.

This will require some setting up, but once you have got it set up, you can forget about loop code & it will basically look like a memcpy.

Upvotes: 3

Jonathan Wood
Jonathan Wood

Reputation: 67273

One thing you could do is increase the size of your buffer. That could help if you have large files.

Another thing is to call directly to the OS, whatever that may be in your case. There is some overhead in fread() and fwrite().

If you could use unbuffered routines and provider your own larger buffer, you may see some noticeable performance improvements.

I'd recommend getting the number of bytes written from the return value from fread() to track when you're done.

Upvotes: 2

Related Questions