Poor IO performance under heavy load

Question

It seems that I have a problem with Linux IO performance. Working with a project I need to clear whole the file from the kernel space. I use the following code pattern:

for_each_mapping_page(mapping, index) {
    page = read_mapping_page(mapping, index);
    lock_page(page);
    { kmap // memset // kunmap }
    set_page_dirty(page);
    write_one_page(page, 1);
    page_cache_release(page);
    cond_resched();
}

All works fine but with large files (~3Gb+ for me) I see that my system stalls in a strange manner: while this operation is not completed I can't run anything. In other words, all the processes that exists before this operation runs fine, but if I try to run something while this operation I see nothing until it completed.

Is it a kernel's IO scheduling issue or may be I missed something? And how can I fix this problem?

Thanks.

UPD:

According to Kristof's suggestion I've reworked my code and now it looks like this:

headIndex = soff >> PAGE_CACHE_SHIFT;
tailIndex = eoff >> PAGE_CACHE_SHIFT;

/**
 * doing the exact @headIndex .. @tailIndex range
 */

for (index = headIndex; index < tailIndex; index += nr_pages) {
    nr_pages = min_t(int, ARRAY_SIZE(pages), tailIndex - index);

    for (i = 0; i < nr_pages; i++) {
        pages[i] = read_mapping_page(mapping, index + i, NULL);
        if (IS_ERR(pages[i])) {
            while (i--)
                page_cache_release(pages[i]);
            goto return_result;
        }
    }

    for (i = 0; i < nr_pages; i++)
        zero_page_atomic(pages[i]);

    result = filemap_write_and_wait_range(mapping, index << PAGE_CACHE_SHIFT,
                          ((index + nr_pages) << PAGE_CACHE_SHIFT) - 1);

    for (i = 0; i < nr_pages; i++)
        page_cache_release(pages[i]);

    if (result)
        goto return_result;

    if (fatal_signal_pending(current))
        goto return_result;

    cond_resched();
}

As the result I've got better IO performance, but still have problems with huge IO activity while doing concurrent disk access within the same user as caused the operation.

Anyway, thanks for the suggestions.

Kristof Provost · Accepted Answer

In essence you're bypassing the kernels IO scheduler completely.

If you look at the ext2 implementation you'll see it never (well ok, once) calls write_one_page(). For large-scale data transfers it uses mpage_writepages() instead.

This uses the Block I/O interface, rather than immediately accessing the hardware. This means it passes through the IO scheduler. Large operations will not block the entire systems, as the scheduler will automatically ensure that other operations are interleaved with the large writes.

Poor IO performance under heavy load

Answers (1)

Related Questions