HBv6
HBv6

Reputation: 3537

How can be that multiple writes on file are faster than a single one

First piece of code:

// code is a private "global variable" for the class
// SourceCodeBuilder is a class that uses StringBuilder()
// basically it is based on String(s), formatted and with many appends depending on the "loc()" calls (see below)
private SourceCodeBuilder code = new SourceCodeBuilder();

[...]

    // create "file.txt" and call algorithm
    fileOut = new FileWriter("file.txt");

    for (int i=0; i<x; i++) {
        algorithm();
    }

Where algorithm() is a method like this:

private void algorithm () {
    for (int i=0; i<y; i++) {
        code.loc("some text");
        code.loc("other text");
        ...
    }

    // after "building" the code value I wrote it on the file
    fileOut.write(code.toString());
    fileOut.flush();
    code.free(); // this call "empties" the code variable (so the next time algorithm() is called it has the code var sets to "" - it frees a lot of memory)
                 // basically it calls "setLength(0)" method of StringBuilder
}

When I do all of this on large text files it takes about 4500ms to execute and less than 60MB of memory.

Then I tried to use this other code. Second piece of code:

private SourceCodeBuilder code = new SourceCodeBuilder();

[...]

    // create "file.txt" and call algorithm
    fileOut = new FileWriter("file.txt");

    for (int i=0; i<x; i++) {
        algorithm();
    }

    fileOut.write(code.toString());
    fileOut.flush();
    fileOut.close();

Where this time algorithm() is a method like this:

private void algorithm () {
    for (int i=0; i<y; i++) {
        code.loc("some text");
        code.loc("other text");
        ...
    }
}

It takes more than 250MB of memory (and it's OK because I don't call the "free()" method on the code variable, so it's a "continuos" append on the same variable), but surprisingly it takes more than 5300ms to execute. That's about 16% slower than the first code, and I can't explain to myself why.

In the first code I write small pieces of text multiple times on "file.txt". In the second code I write a big piece of text, but only one time, on "file.txt", and using more memory. With the second code I was expecting more memory consumption, but not even more CPU consumption (just because there are more I/O operations).

Conclusion: the first piece of code is faster than the second one, even if the first one does more I/O operations than the second one. Why? Am I missing something?

Upvotes: 2

Views: 202

Answers (2)

Peter Lawrey
Peter Lawrey

Reputation: 533620

Every system call has an overhead which you can avoid by using a BufferedWriter or reader or stream. (This is why you would use buffering)

In the first case you are buffering the entire contents before writing it. In the second case you are writing the file a little bit at a time which will result in amny more system calls and thus more overhead.

If you were to generate the file faster, you might find that almost all your time is spent in system calls.

The reason you stream data in blocks (using buffering) is so that you don't use so much memory. i.e. there is a point at which larger buffers slow you down rather than helping.

In your case I suspect you are writing to a StringBuilder or StringWriter (which uses StringBuffer) and this must be copied as it is resized to be the size you eventually need. This creates a bit of GC overhead which results in more copying as well.

Upvotes: 4

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726779

When you are slowly filling a large memory buffer, the time required for that grows non-linearly, because you need to re-allocate the buffer multiple times, each time copying the entire content to a new location in memory. This takes time, especially when the buffer is 200MB+. If you preallocate the buffer, your process may go faster.

However, all the above is just my guess. You should profile your application to see where the additional time really goes.

Upvotes: 3

Related Questions