Reputation:
I have written some Posix programs that make use of mapped file buffer. One simple scenario is to map a 1GB file into memory and fill up the entire file with content.
During the program execution there was little to no disk IO until msync
or munmap
call happens.
On exactly the same system, I wrote down the equivalent program in Java running on Oracle JDK 7, and noticed massive amount of disk IO activities throughout the entire program execution.
How is memory mapped file buffer implemented differently in JVM? And is there anyway to delay the massive IO activities?
The operating system is a Linux 3.2 x64.
Code:
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
public class Main {
public static void main(String[] args) throws Exception {
long size = 1024 * 1048576;
RandomAccessFile raf= new RandomAccessFile("mmap1g", "rw");
FileChannel fc = raf.getChannel();
MappedByteBuffer buf = fc.map(FileChannel.MapMode.READ_WRITE, 0, size);
for(long i = 0; i < size; ++i)
buf.put((byte)1);
}
}
Upvotes: 8
Views: 2088
Reputation: 533530
Memory Mapping is entirely implemented in the OS. The JVM has no say in how it is flushed to disk except by means of the force()
method and the "rws"
options when you option the file.
Linux will flush to disk based on the kernel parameters set in sysctl.
$ sysctl -a | grep dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
These are the defaults on my laptop. The ratio 10
means it will start writing the data to disk in the background when 10% of main memory is dirty. The writeback of 20% means the writing program will stop, untilt he dirty percent drops below 20%. In any case, data will be written to disk after 3000 centi-seconds or 30 seconds.
An interesting comparison, it to memory map a file on a tmpfs
filesystem. I have /tmp
mounted as tmpfs but most systems have /dev/shm.
BTW You might find this class interesting. MemoryStore allows you to map any size of memory i.e. >> 2 GB and perform thread safe operation on it. e.g. you can shared the memory across processes. It supports off heap locks, volatile read/write, ordered write and CAS.
I have a test where two processes lock, toggle, unlock records and the latency is 50 ns on average on my laptop.
BTW2: Linux has sparse files which means you can map in regions not only larger than your main memory, but larger than your free disk space. e.g. if you map in 8 TB and only use random pieces of 4 GB, it will use up to 4 GB in memory and 4 GB on disk. If you use du {file}
you can see the actual space used. Note: lazy allocation of disk space can lead to highly fragmented files which can be a performance problem for HDD.
Upvotes: 10