mav
mav

Reputation: 1248

Fast file copy with progress

I'm writing an SDL application for Linux, that works from the console (no X server). One function I have is a file copy mechanism, that copies specific files from HDD to USB Flash device, and showing progress of this copy in the UI. To do this, I'm using simple while loop and copying file by 8kB chunks to get copy progress. The problem is, that it's slow. I get to copy a 100 MB file in nearly 10 minutes, which is unacceptable.

How can I implement faster file copy? I was thinking about some asynchronous API that would read file from HDD to a buffer and store the data to USB in separate thread, but I don't know if I should implement this myself, because it doesn't look like an easy task. Maybe you know some C++ API/library that can that for me? Or maybe some other, better method?

Upvotes: 2

Views: 7234

Answers (3)

Alex
Alex

Reputation: 2965

Here is an example using boost::filesystem or std::filesystem.

As copy_file() is blocking and doesn't provide progress interface I am using separate thread to check destination path size and print progress.

As part of an UI framework, I would separate both of the threads in following code from main UI thread and modify print actions to post percentage on main UI thread.

Its also good idea to skip small files entirely from this progress calculation.

using namespace boost::filesystem;
using boost::system::error_code;

double now() {
  const std::chrono::time_point<std::chrono::system_clock> now = std::chrono::system_clock::now();
  return now.time_since_epoch().count() / 1000000.0;
}

// 
// Calculate progress and throughput
//
struct ProgressTimer {
  int64_t current_size = 0;
  double  start_ts     = now();
  double  last_ts      = now();

  float Mbps       = 0;
  float final_Mbps = 0;
  float percentage = 0;

  bool update(int64_t new_size, int64_t source_size) {
    if (new_size == current_size) return false;
    double ts    = now();
    Mbps         = ((current_size - new_size) * 8 / 1000000.0) / (last_ts - ts);
    current_size = new_size;
    last_ts      = ts;
    percentage   = 100 * float(current_size) / source_size;
    final_Mbps   = (current_size * 8 / 1000000.0) / (last_ts - start_ts);
    return true;
  }
};

//
// Run periodic check for file size while copy operation is ongoing
//
class SizeMonitor {
 public:
  SizeMonitor(int64_t source_size, boost::filesystem::path destination) {
    this->source_size = source_size;
    this->destination = destination;
    thread = std::thread([this]() { this->run(); });
  }
  void stop() {
    running = false;
    this->thread.join();
  }

 protected:
  std::thread   thread;
  volatile bool running = true;
  int64_t       source_size;

  boost::filesystem::path destination;

  void run() {
    auto          sleep = [&]() { std::this_thread::sleep_for(std::chrono::milliseconds(16)); };
    error_code    ec;
    ProgressTimer progress;
    while (running) {
      int64_t latest_size = file_size(destination, ec);
      if (ec.failed()) {
        // Expect `system:2` error on first call to file_size as this thread is started before copy operation
        std::cout << "Thread Error: file_size(" << destination.native() << ") " << ec.to_string() << std::endl;
        sleep();
        continue;
      }
      if (!progress.update(latest_size, source_size)) {
        // Expect same size on second call to file_size as data is about to be flushed to disk
        std::cout << "Thread: same size " << progress.current_size << std::endl;
        sleep();
        continue;
      }
      std::cout << "Thread: size " << progress.current_size << " progress: " << progress.percentage << " Mbps=" << progress.Mbps << std::endl;
      sleep();
    }
    progress.update(source_size, source_size);
    // Expect slight calculation error for final_Mbps as sleep() above was not interrupted
    std::cout << "Final Mbps=" << progress.final_Mbps << " time=" << (progress.last_ts - progress.start_ts) << std::endl;
  }
};

//
// Create file copy monitor then run copy operation
// In an UI application this thread shouldn't be main thread and SizeMonitor::run should post updates on main thread
//
int main() {
  boost::filesystem::path source("large_video_file.mkv");
  boost::filesystem::path destination("_test_large_file.mkv");
  // stat source file
  error_code              ec;
  if (false == exists(source, ec) || ec.failed()) {
    std::cout << "Error: source file missing, " << source.native() << std::endl;
    return -1;
  }
  int64_t source_size = file_size(source, ec);
  if (ec.failed()) {
    std::cout << "Error: file_size(" << source.native() << ") " << ec.to_string() << std::endl;
    return -1;
  }
  // delete leftover dest file from previous run, ignore if it fails
  remove(destination, ec);
  // start monitor thread
  SizeMonitor  print_progress(source_size, destination);
  // copy file
  copy_options op     = copy_options::overwrite_existing;
  const bool   status = copy_file(source, destination, op, ec);
  std::cout << "status=" << status << " ec=" << ec.to_string() << std::endl;
  // join monitor thread
  print_progress.stop();
  // delete leftover dest file
  remove(destination, ec);
  return 0;
}

Upvotes: -1

Adam Rosenfield
Adam Rosenfield

Reputation: 400194

Don't synchronously update your UI with the copy progress, that will slow things down considerably. You should run the file copy on a separate thread from the main UI thread so that the file copy can proceed as fast as possible without impeding the responsiveness of your application. Then, the UI can update itself at the natural rate (e.g. at the refresh rate of your monitor).

You should also use a larger buffer size than 8 KB. Experiment around, but I think you'll get faster results with larger buffer sizes (e.g. in the 64-128 KB range).

So, it might look something like this:

#define BUFSIZE (64*1024)

volatile off_t progress, max_progress;

void *thread_proc(void *arg)
{
    // Error checking omitted for expository purposes
    char buffer[BUFSIZE];
    int in = open("source_file", O_RDONLY);
    int out = open("destination_file", O_WRONLY | O_CREAT | O_TRUNC);

    // Get the input file size
    struct stat st;
    fstat(in, &st);

    progress = 0;
    max_progress = st.st_size;

    ssize_t bytes_read;
    while((bytes_read = read(in, buffer, BUFSIZE)) > 0)
    {
        write(out, buffer, BUFSIZE);
        progress += bytes_read;
    }

    // copy is done, or an error occurred
    close(in);
    close(out);

    return 0;
}

void start_file_copy()
{
    pthread_t t;
    pthread_create(&t, NULL, &thread_proc, 0);
}

// In your UI thread's repaint handler, use the values of progress and
// max_progress

Note that if you are sending a file to a socket instead of another file, you should instead use the sendfile(2) system call, which copies the file directly in kernel space without round tripping into user space. Of course, if you do that, you can't get any progress information, so that may not always be ideal.

For Windows systems, you should use CopyFileEx, which is both efficient and provides you a progress callback routine.

Upvotes: 5

Amirshk
Amirshk

Reputation: 8258

Let the OS do all the work:

  1. Map the file to memory: mmap, will drastically speed up the reading process.
  2. Save it to a file using msync.

Upvotes: 4

Related Questions