Let's say one program is reading file F.txt, and another program is writing to this file at the same moment. (When I'm thinking about how would I implement this functionality if I were a system programmer) I realize that there can be ambiguity in: what will the first program see? where does the second program write new bytes? (i.e. write "in place" vs write to a new file and then replace the old file with the new one) how many programs can write to the same file simultaneously? .. and maybe something not so obvious. So, my questions are: what are the main strategies for reading/writing files functionality? which of them are supported in which OS (Windows, Linux, Mac OS etc)? can it be dependent on certain programming language? (I can suppose that Java can try to provide some unified behavior on all supported OSs)

A single byte read has a long journey to go, from the magnetic plate/flash cell to your local Java variable. This is the path that a single byte travels: Magnetic plate/flash cell Internal hard disc buffer SATA/IDE bus SATA/IDE buffer PCI/PCI-X bus Computer's data bus Computer's RAM via DMA OS Page-cache Libc read buffer, aka user space fopen() read buffer Local Java variable For performance reasons, most of the file buffering done by the OS is kept on the Page Cache, storing the recent read and write files contents on RAM. That means that every read and write operation from your Java code is done from and to your local buffer: FileInputStream fis = new FileInputStream("/home/vz0/F.txt"); // This byte comes from the user space buffer. int oneByte = fis.read(); A page is usually a single block of 4KB of memory. Every page has some special flags and attributes, one of them being the "dirty page", which means that page has some modified data not written to phisical media. Some time later, when the OS decides to flush the dirty data back to the disk, it sends the data on the opposite direction from where it came. Whenever two distinct process writes data to the same file, the resulting behaviour is: Impossible, if the file is locked. The secondth process won't be able to open the file. Undefined, if writing over the same region of the file. Expected, if operating over different regions of the file. A "region" is dependant on the internal buffer sizes that your application uses. For example, on a two megabytes file, two distinct processes may write: One on the first 1kB of data (0; 1024). The other on the last 1kB of data (2096128; 2097152) Buffer overlapping and data corruption would occur only when the local buffer is two megabytes in size. On Java you can use the Channel IO to read files with a fine-grained control of what's going on inside. Many transactional databases forces some writes from the local RAM buffers back to disk by issuing a sync operation . All the data related to a single file gets flushed back to the magnetic plates or flash cells, effectively ensuring that on power failure no data will be lost. Finally, a memory mapped file is a region of memory that enables a user process to read and write directly from and to the page cache, bypassing the user space buffering. The Page Cache system is vital to the performance of a multitasking protected mode OS, and every modern operating system (Windows NT upwards, Linux, MacOS, *BSD) supports all these features.

javafileprogramming-languagesfilesystemsoperating-system

Roman

Reputation: 66156

What's going on (in the OS level) when I'm reading/writing a file?

Let's say one program is reading file F.txt, and another program is writing to this file at the same moment.

(When I'm thinking about how would I implement this functionality if I were a system programmer) I realize that there can be ambiguity in:

what will the first program see?
where does the second program write new bytes? (i.e. write "in place" vs write to a new file and then replace the old file with the new one)
how many programs can write to the same file simultaneously?

.. and maybe something not so obvious.

So, my questions are:

what are the main strategies for reading/writing files functionality?
which of them are supported in which OS (Windows, Linux, Mac OS etc)?
can it be dependent on certain programming language? (I can suppose that Java can try to provide some unified behavior on all supported OSs)

Upvotes: 5

Answers (3)

user334596

Reputation:

Strategies can be as much as file systems. Generally, the OS focuses on the avoidance of I/O operations by caching the file before it is synchronized with the disc. Reading from the buffer will see the previously saved data to it. So between the software and hardware is a layer of buffering (eg MySQL MyISAM engine uses this layer much)

JVM synchronize file descriptor buffers to disk at closing file or when a program is invoking methods like fsync() but buffers may be synchronized also by OS when they exceed the defined thresholds. In the JVM this is of course unified on all supported OS.

Upvotes: 0

vz0

Reputation: 32923

A single byte read has a long journey to go, from the magnetic plate/flash cell to your local Java variable. This is the path that a single byte travels:

Magnetic plate/flash cell
Internal hard disc buffer
SATA/IDE bus
SATA/IDE buffer
PCI/PCI-X bus
Computer's data bus
Computer's RAM via DMA
OS Page-cache
Libc read buffer, aka user space fopen() read buffer
Local Java variable

For performance reasons, most of the file buffering done by the OS is kept on the Page Cache, storing the recent read and write files contents on RAM.

That means that every read and write operation from your Java code is done from and to your local buffer:

FileInputStream fis = new FileInputStream("/home/vz0/F.txt");

// This byte comes from the user space buffer.
int oneByte = fis.read();

A page is usually a single block of 4KB of memory. Every page has some special flags and attributes, one of them being the "dirty page", which means that page has some modified data not written to phisical media.

Some time later, when the OS decides to flush the dirty data back to the disk, it sends the data on the opposite direction from where it came.

Whenever two distinct process writes data to the same file, the resulting behaviour is:

Impossible, if the file is locked. The secondth process won't be able to open the file.
Undefined, if writing over the same region of the file.
Expected, if operating over different regions of the file.

A "region" is dependant on the internal buffer sizes that your application uses. For example, on a two megabytes file, two distinct processes may write:

One on the first 1kB of data (0; 1024).
The other on the last 1kB of data (2096128; 2097152)

Buffer overlapping and data corruption would occur only when the local buffer is two megabytes in size. On Java you can use the Channel IO to read files with a fine-grained control of what's going on inside.

Many transactional databases forces some writes from the local RAM buffers back to disk by issuing a sync operation. All the data related to a single file gets flushed back to the magnetic plates or flash cells, effectively ensuring that on power failure no data will be lost.

Finally, a memory mapped file is a region of memory that enables a user process to read and write directly from and to the page cache, bypassing the user space buffering.

The Page Cache system is vital to the performance of a multitasking protected mode OS, and every modern operating system (Windows NT upwards, Linux, MacOS, *BSD) supports all these features.

Upvotes: 10

Martin Beckett

Reputation: 96109

http://ezinearticles.com/?How-an-Operating-Systems-File-System-Works&id=980216

Upvotes: 1

What&#39;s going on (in the OS level) when I&#39;m reading/writing a file?

Answers (3)

Related Questions

What's going on (in the OS level) when I'm reading/writing a file?