aKumara
aKumara

Reputation: 401

Linux read call behavior when a segment of the file is corrupted

Context :
OS : Red hat 8.X
File systems : EXT4, XFS
Storage Types : SSD, HDD

Corruption : Meant here is an activity that result in written data cannot be retrieved as it was written. .e.g. Disk Device level corruption.

Linux read call signature is ssize_t read(int fd, void buf[.count], size_t count);.
Say the file referred by fd, has corrupted segments (+ NOT corrupted segments). If the read request goes through one or more corrupted segments(assume segments are A(OK)--B(corrupted)--C(OK)--D(corrupted)--E(OK) and fd's file position is set before the beginning of A and "count" is large enough to contain all A -> E segments),

  1. Is there a possibility of read's return value to be larger than ZERO ? (and buf to contain data) ?
    If so,
    1.1. What would be contained in buf ? will it contain any data from corrupted segments B and D ? What could be the return value of read' ?

    1.2 What are probability of this happening ? What factors could increase the probability of this happening ? e.g. re-boot ?

  2. Would the file size returned by fstat count any bytes from corrupted segments ?

Purpose : I am trying to decide(under above given OS, File system conditions), if I NEED to add a "application level calculated checksum" along with written(binary) data and when reading the same file if read returns success(i.e. return value > 0), validate the (app level written)checksum before concluding data as valid.
Also I am NOT worried about some intruder modifying the written data here. Only worried about things that can happen from system activity. e.g. machine re-boot

Upvotes: 0

Views: 121

Answers (1)

bk2204
bk2204

Reputation: 76489

If A can be read, the kernel will return the length of A, and that portion of the read will be successful. This would be known as a short read. Once that happens, if you make another call to read and B cannot be read, you will get an EIO error. That could be a problem with a network file system, a bad block, a file system error, or anything else that prevents the data from being read.

Once the call to read B fails, it will continue to fail because the file offset is not advanced beyond that. If you use pread to read an unaffected portion, or if you lseek to an unaffected portion, you'll be able to continue to read until you hit an affected portion.

This is generally the standard Unix behaviour, and would be expected of any POSIX system. The error code on failure might differ in some cases on some systems (for example, the OS might automatically remount the file system read only and return some other error code in that case), but generally one reads all the data that can be validly read, and then if further progress is not possible, one gets an error.

Upvotes: 1

Related Questions