Reputation: 66156
Let's say one program is reading file F.txt, and another program is writing to this file at the same moment.
(When I'm thinking about how would I implement this functionality if I were a system programmer) I realize that there can be ambiguity in:
what will the first program see?
where does the second program write new bytes? (i.e. write "in place" vs write to a new file and then replace the old file with the new one)
how many programs can write to the same file simultaneously?
.. and maybe something not so obvious.
So, my questions are:
what are the main strategies for reading/writing files functionality?
which of them are supported in which OS (Windows, Linux, Mac OS etc)?
can it be dependent on certain programming language? (I can suppose that Java can try to provide some unified behavior on all supported OSs)
Upvotes: 5
Views: 2447
Reputation:
Strategies can be as much as file systems. Generally, the OS focuses on the avoidance of I/O operations by caching the file before it is synchronized with the disc. Reading from the buffer will see the previously saved data to it. So between the software and hardware is a layer of buffering (eg MySQL MyISAM engine uses this layer much)
JVM synchronize file descriptor buffers to disk at closing file or when a program is invoking methods like fsync()
but buffers may be synchronized also by OS when they exceed the defined thresholds. In the JVM this is of course unified on all supported OS.
Upvotes: 0
Reputation: 32923
A single byte read has a long journey to go, from the magnetic plate/flash cell to your local Java variable. This is the path that a single byte travels:
fopen()
read bufferFor performance reasons, most of the file buffering done by the OS is kept on the Page Cache, storing the recent read and write files contents on RAM.
That means that every read and write operation from your Java code is done from and to your local buffer:
FileInputStream fis = new FileInputStream("/home/vz0/F.txt");
// This byte comes from the user space buffer.
int oneByte = fis.read();
A page is usually a single block of 4KB of memory. Every page has some special flags and attributes, one of them being the "dirty page", which means that page has some modified data not written to phisical media.
Some time later, when the OS decides to flush the dirty data back to the disk, it sends the data on the opposite direction from where it came.
Whenever two distinct process writes data to the same file, the resulting behaviour is:
A "region" is dependant on the internal buffer sizes that your application uses. For example, on a two megabytes file, two distinct processes may write:
Buffer overlapping and data corruption would occur only when the local buffer is two megabytes in size. On Java you can use the Channel IO to read files with a fine-grained control of what's going on inside.
Many transactional databases forces some writes from the local RAM buffers back to disk by issuing a sync
operation. All the data related to a single file gets flushed back to the magnetic plates or flash cells, effectively ensuring that on power failure no data will be lost.
Finally, a memory mapped file is a region of memory that enables a user process to read and write directly from and to the page cache, bypassing the user space buffering.
The Page Cache system is vital to the performance of a multitasking protected mode OS, and every modern operating system (Windows NT upwards, Linux, MacOS, *BSD) supports all these features.
Upvotes: 10
Reputation: 96109
http://ezinearticles.com/?How-an-Operating-Systems-File-System-Works&id=980216
Upvotes: 1