Jimm
Jimm

Reputation: 8505

Reliable writes in linux

My requirement is to write a never ending stream of incoming variable sized binary messages to the file system. Messages of average size 2KB arrive at 1000 messages/sec. So in an hour, total number of messages would be 3600*1000*2 = 6.8 GB. The main purpose of the messages is following 1. Archive them for auditing purposes 2. Provide a search interface

My questions are

  1. Is there an open source software that solves this problem
  2. What kind of errors can occur if the process writes in multiples of block size and the process crashes in middle of writing the block
  3. What kind of errors can occur, where application has written a block size, but file system has not flushed the data to the disk.
  4. can inodes get corrupted in any scenario
  5. is there a file size limitation in linux?
  6. Is there an ideal file size ? What are pros and cons of large file (in GB) vs medium file (in MB)
  7. Any other things to watch for?
  8. My preference is to use C++, but if needed, i can switch to C.

Upvotes: 0

Views: 867

Answers (4)

socketpair
socketpair

Reputation: 2000

You should use SQLite. It solves everything you need. Including speed if you properly use that DB.

Upvotes: 0

Damon
Damon

Reputation: 70206

Once write or writev returns (i.e. the OS has accepted it), the operating system is responsible for writing data to disk. It's not your problem any more, and it's happening irrespectively of your process crashing. Note that you have no control over the exact amount of data accepted or actually written at a time, nor whether it happens in multiples of filesystem blocks or whether it's any particular size at all. You send a request to write and it tells you how much it actually accepted, and it will write that to disk, at its own discretion.
Probably this will happen in multiples of the block size because it makes sense for the OS to do that, but this is not guaranteed in any way (on many systems, Linux included, reading and writing is implemented via or tightly coupled with file mapping).

The same "don't have to care" guarantee holds for file mapping (with the theoretical exception that a crashing application could in principle still write into a still mapped area, but once you've unmapped an area, that cannot happen even theoretically). Unless you pull the plug (or the kernel crashes), data will be written, and consistently.
Data will only ever be written in multiples of filesystem blocks, because memory pages are multiples of device blocks, and file mapping does not know anything else, it just works that way.

You can kind of (neglecting any possible unbuffered on-disk write cache) get some control over what's on the disk with fdatasync. When that function returns, what has been in the buffers before has been sent to the disk.
However, that still doesn't prevent your process from crashing in another thread in the mean time, and it doesn't prevent someone from pulling the plug. fdatasync is preferrable over fsync since it doesn't touch anything near the inode, meaning it's faster and safer (you may lose the last data written in a subsequent crash since the length has not been updated yet, but you should never destroy/corrupt the whole file).

C library functions (fwrite) do their own buffering and give you control over the amount of data you write, but having "written" data only means it is stored in a buffer owned by the C library (in your process). If the process dies, the data is gone. No control over how the data hits the disk, or if ever. (N.b.: You do have some control insofar as you can fflush, this will immediately pass the contents of the buffers to the underlying write function, most likely writev, before returning. With that, you're back at the first paragraph.)

Asynchronous IO (kernel aio) will bypass kernel buffers and usually pull the data directly from your process. Your process dies, your data is gone. Glibc aio uses threads that block on write, the same as in paragraph 1 applies.

What happens if you pull the plug or hit the "off" switch at any time? Nobody knows.
Usually some data will be lost, an operating system can give many guarantees, but it can't do magic. Though in theory, you might have a system that buffers RAM with a battery or a system that has a huge dedicated disk cache which is also battery powered. Nobody can tell. In any case, plan for losing data.
That said, what's once written should not normally get corrupted if you keep appending to a file (though, really anything can happen, and "should not" does not mean a lot).

All in all, using either write in append mode or file mapping should be good enough, they're as good as you can get anyway. Other than sudden power loss, they're reliable and efficient.
If power failure is an issue, an UPS will give better guarantees than any software solution can provide.

As for file sizes, I don't see any reason to artificially limit file sizes (assuming a reasonably new filesystem). Usual file size limits for "standard" Linux filesystems (if there is any such thing) are in the terabyte range.
Either way, if you feel uneasy with the idea that corrupting one file for whatever reason could destroy 30 days worth of data, start a new file once every day. It doesn't cost extra.

Upvotes: 2

tuxuday
tuxuday

Reputation: 3037

You have an interesting problem at hand. I am not an expert in these area, but have enough knowledge to comment on these.

You can read this, if you haven't already, to get general overview on various filesystems linux their pros & cons, limits etc. Comparison of FileSystems in Linux

1) I have come across auto rotating log file libraries, in Python/Perl, sure same in available in C/c++ too. 2/3/4) Journaling file-systems protect from file-system crash to a greater extent. They have support for journaling of data too, but haven't used much of it.

Check this for more info on journaling

Upvotes: 0

Andreas Florath
Andreas Florath

Reputation: 4612

The problem here is, that the scenario is not described exact. Therefore some of the answers are guesswork:

  1. Yes - It's called 'g++'. ;-)
  2. Many different. IMHO try to avoid this by writing good and many test cases for your program.
  3. Depending on your system and your program, writing 'only' into a memory buffer is the normal way of doing things. There should be no problem.
  4. This depends on the failure scenarios and on the used file system. (There are also file system without inodes.)
  5. Each filesystem has it's file size limitation. The correct answer (which might be useless for you) is: Yes.
  6. No - this heavily depends on your application and your environment (like hard disks, backup system, IO system, ...)
  7. More information is needed to answer this.
  8. Not a question.

Hope this helps for the first step. If you decided in which direction you will go, please add information to the question - the more requirements you give the better the answer can be.

Upvotes: 0

Related Questions