Large file copy with GCD - Dispatch IO consumes large amounts of memory

Question

I'm transitioning a large file copy operation from NSStream to a dispatch IO implementation with GCD.

When copying two 1GB files together into a single 2GB file, the app consumes 2GB of memory with GCD. The NSStream implementation consumes just 50MB.

In Instruments, I can see start_wqthread calls allocating 1MB chunks, as I requested with my block size for the dispatch IO high water mark, but instead of being freed after being written to the output stream, they hang around.

How can I free the buffer after it has been written to the output stream?

If I create a completely new OS X Cocoa application in Xcode and paste the following code in the applicationDidFinishLaunching: method, it will consume 500-2000MB of memory. (To test, replace the temp file references with local file references.)

When creating a new project using the OS 10.9 SDK targeting OS 10.9, calls to dispatch_release() are forbidden by ARC. When targeting OS 10.6 in an older project, even with ARC enabled, calls to dispatch_release() are allowed but have no effect on the memory footprint.

NSArray* files = @[@"/1GBFile.tmp", @"/1GBFile2.tmp"];
NSString* outFile = @"/outFile.tmp";
NSString* queueName = [NSString stringWithFormat:@"%@.IO", [[NSBundle mainBundle].infoDictionary objectForKey:(id)kCFBundleIdentifierKey]];

dispatch_queue_t queue = dispatch_queue_create(queueName.UTF8String, DISPATCH_QUEUE_SERIAL);
dispatch_io_t io_write = dispatch_io_create_with_path(DISPATCH_IO_STREAM, outFile.UTF8String, (O_RDWR | O_CREAT | O_APPEND), (S_IWUSR | S_IRUSR | S_IRGRP | S_IROTH), queue, NULL);
dispatch_io_set_high_water(io_write, 1024*1024);

[files enumerateObjectsUsingBlock:^(NSString* file, NSUInteger idx, BOOL *stop) {
    dispatch_io_t io_read = dispatch_io_create_with_path(DISPATCH_IO_STREAM, file.UTF8String, O_RDONLY, 0, queue, NULL);
    dispatch_io_set_high_water(io_read, 1024*1024);
    dispatch_io_read(io_read, 0, SIZE_MAX, queue, ^(bool done, dispatch_data_t data, int error) {
        if (error) {
            dispatch_io_close(io_write, 0);
            return;
        }

        if (data) {
            size_t bytesRead = dispatch_data_get_size(data);
            if (bytesRead > 0) {
                dispatch_io_write(io_write, 0, data, queue, ^(bool doneWriting, dispatch_data_t dataToBeWritten, int errorWriting) {
                    if (errorWriting) {
                        dispatch_io_close(io_read, DISPATCH_IO_STOP);
                    }
                });
            }
        }

        if (done) {
            dispatch_io_close(io_read, 0);
            if (files.count == (idx+1)) {
                dispatch_io_close(io_write, 0);
            }
        }
    });
}];

nekno · Accepted Answer

I believe I've worked out a solution using a dispatch group.

The code essentially copies each file in sequence synchronously (blocking the loop from processing the next file until the previous file has been completely read and written), but allows the file reading and writing operations to be queued asynchronously.

I believe the memory over-consumption was due to the fact that reads for multiple files were being queued simultaneously. I would have thought that would be fine for a serial queue, but it seems blocking progress with a dispatch group, so that only work to read and write a single file is queued, does the trick. With the following code, peak memory usage is ~7MB.

Now, a single input file is queued to be read, each read operation queues its corresponding write operations, and the loop on the input files is blocked until all reading and writing operations are complete.

NSArray* files = @[@"/1GBFile.tmp", @"/1GBFile2.tmp"];
NSString* outFile = @"/outFile.tmp";
NSString* queueName = [NSString stringWithFormat:@"%@.IO", [[NSBundle mainBundle].infoDictionary objectForKey:(id)kCFBundleIdentifierKey]];

dispatch_queue_t queue = dispatch_queue_create(queueName.UTF8String, DISPATCH_QUEUE_SERIAL);
dispatch_group_t group = dispatch_group_create();
dispatch_io_t io_write = dispatch_io_create_with_path(DISPATCH_IO_STREAM, outFile.UTF8String, (O_RDWR | O_CREAT | O_APPEND), (S_IWUSR | S_IRUSR | S_IRGRP | S_IROTH), queue, NULL);
dispatch_io_set_high_water(io_write, 1024*1024);

[files enumerateObjectsUsingBlock:^(NSString* file, NSUInteger idx, BOOL *stop) {
    dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
    if (*stop) {
        return;
    }
    dispatch_group_enter(group);
    dispatch_io_t io_read = dispatch_io_create_with_path(DISPATCH_IO_STREAM, file.UTF8String, O_RDONLY, 0, queue, NULL);
    dispatch_io_set_high_water(io_read, 1024*1024);
    dispatch_io_read(io_read, 0, SIZE_MAX, queue, ^(bool done, dispatch_data_t data, int error) {
        if (error || *stop) {
            dispatch_io_close(io_write, 0);
            *stop = YES;
            return;
        }

        if (data) {
            size_t bytesRead = dispatch_data_get_size(data);
            if (bytesRead > 0) {
                dispatch_group_enter(group);
                dispatch_io_write(io_write, 0, data, queue, ^(bool doneWriting, dispatch_data_t dataToBeWritten, int errorWriting) {
                    if (errorWriting || *stop) {
                        dispatch_io_close(io_read, DISPATCH_IO_STOP);
                        *stop = YES;
                        dispatch_group_leave(group);
                        return;
                    }

                    if (doneWriting) {
                        dispatch_group_leave(group);
                    }
                });
            }
        }

        if (done) {
            dispatch_io_close(io_read, 0);
            if (files.count == (idx+1)) {
                dispatch_io_close(io_write, 0);
            }
            dispatch_group_leave(group);
        }
    });
}];

Large file copy with GCD - Dispatch IO consumes large amounts of memory

Answers (2)

Related Questions