Arpssss
Arpssss

Reputation: 3858

Java IO Performance Issue

I am using:

PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("test.txt"),1024*1024*500))

to write a large file (approx. 2GB). It takes 26 seconds to write. But, when I replace 500 with 10/20, it takes 19 seconds.

From here, what I understood is buffering gives better performance. If so, then why is this happening? I checked it by running 5 times each, so system/IO load is not an issue.

Upvotes: 4

Views: 1601

Answers (6)

Peter Lawrey
Peter Lawrey

Reputation: 533492

As I said in a previous question, there is an optimal buffer size (which is typically around 32 KB) and as you make the buffer larger than this it is slower not faster. The default buffer size is 8 KB.

BTW: How large is your L2/L3 CPU cache? (about 10 MB I suspect) Your primary L1 cache is about 32 KB?

By using a buffer which fits into the fastest cache, you are using the fastest memory. By using a buffer which only fits in main memory, you are using the slowest memory (as much as 10x slower)


In answer to your question.

What I do is assume ISO-8859-1 encoding i.e (byte) ch and write a byte at a time to a ByteBuffer, possibly memory mapped.

I have methods for writing/reading long and double from a ByteBuffer without creating any garbage.

https://github.com/peter-lawrey/Java-Chronicle/blob/master/src/main/java/vanilla/java/chronicle/impl/AbstractExcerpt.java

Using this approach you can log about 5 million lines per second to disk.

Upvotes: 3

Stephen C
Stephen C

Reputation: 718758

Buffering your I/O improves performance up to a point by reducing the number of system calls made. But system calls are not that expensive (maybe a millisecond or so), and an overly large buffer could cause problems in other areas. For example:

  • A 500 Mbyte buffer uses a lot of memory, and potentially increases GC overheads, or increases the system's paging load.

  • If you write 500 Mbytes in a single write call, the write could saturate the system's buffer cache and overwhelm its ability to overlap disc writes with doing other things at the application level.

Just try using a (significantly) smaller buffer. (I personally wouldn't use a buffer bigger that 8kb without doing some application-specific tuning.)

Upvotes: 1

chubbsondubs
chubbsondubs

Reputation: 38676

First you really don't need a buffer that big. Generally 64K or even as low as 8K will be sufficient to get descent IO performance. Any larger and you're just wasting memory and cpu because as you get the buffer bigger and bigger it spends more time at the IO layer writing a big chunk of data. So it's a trade off (min-max if you understand calculus) between waiting for IO and just writing to memory. You can't shove huge buffers to the IO device because it has an internal fixed size buffer. The point is to try and match it as best as possible, but realizing it's relatively impossible to do that because you don't know what other processing are doing. The best thing to do is try something low 8K-16K, run it, measure it. Double the buffer 32K, etc, run it, measure it. If you get a speed improvement do it again. Once you stop getting speed improvements divide by 2, and stop.

So if you wrote 2GB of data in 26s that's a throughput of 76MB/S or 650Mbit/s. You could probably improve it just by lowering the buffer size to something reasonable.

Upvotes: 1

Lucas
Lucas

Reputation: 5069

Having an overly large buffer decreases performance. Stick to around 32-64 kb IMO

Upvotes: 2

Martijn Courteaux
Martijn Courteaux

Reputation: 68847

Very large buffers (500 MB) are also not good, because it will be for the OS more difficult to do the memory management for that huge byte buffer.

Compare it with moving a table in your house instead of moving a box. But if your boxes become too small, you will have to go and come many times.

Don't forget that allocating memory is a O(n) operation.

Upvotes: 1

mcfinnigan
mcfinnigan

Reputation: 11638

1024*1024*500 is 500 megabytes, give or take a smidgen. You're basically forcing the JVM to allocate a 500mb block of contiguous memory, which the JVM is probably having to do a GC cycle to do.

Upvotes: 1

Related Questions