Reputation: 8870
While looking for ways to find the size of a file given a FILE*
, I came across this article advising against it. Instead, it seems to encourage using file descriptors and fstat
.
However I was under the impression that fstat
, open
and file descriptors in general are not as portable (After a bit of searching, I've found something to this effect).
Is there a way to get the size of a file in ANSI C while keeping in line with the warnings in the article?
Upvotes: 14
Views: 12192
Reputation: 101
The executive summary is that you must use fseek/ftell because there is no alternative (even the implementation specific ones) that is better.
The underlying issue is that the "size" of a file in bytes is not always the same as the length of the data in the file and that, in some circumstances, the length of the data is not available.
A POSIX example is what happens when you write data to a device; the operating system only knows the size of the device. Once the data has been written and the (FILE*) closed there is no record of the length of the data written. If the device is opened for read the fseek/ftell approach will either fail or give you the size of the whole device.
When the ANSI-C committee was sitting at the end of the 1980's a number of operating systems the members remembered simply did not store the length of the data in a file; rather they stored the disk blocks of the file and assumed that something in the data terminated it. The 'text' stream represents this. Opening a 'binary' stream on those files shows not only the magic terminator byte, but also any bytes beyond it that were never written but happen to be in the same disk block.
Consequently the C-90 standard was written so that it is valid to use the fseek trick; the result is a conformant program, but the result may not be what you expect. The behavior of that program is not 'undefined' in the C-90 definition and it is not 'implementation-defined' (because on UN*X it varies with the file). Neither is it 'invalid'. Rather you get a number you can't completely rely on or, maybe, depending on the parameters to fseek, -1 and an errno.
In practice if the trick succeeds you get a number that includes at least all the data, and this is probably what you want, and if the trick fails it is almost certainly someone else's fault.
John Bowler
Upvotes: 3
Reputation: 58500
The article has a little problem of logic.
It (correctly) identifies that a certain usage of C functions has behavior which is not defined by ISO C. But then, to avoid this undefined behavior, the article proposes a solution: replace that usage with platform-specific functions. Unfortunately, the use of platform-specific functions is also undefined according to ISO C. Therefore, the advice does not solve the problem of undefined behavior.
The quote in my copy of the 1999 standard confirms that the alleged behavior is indeed undefined:
A binary stream need no meaningfully support fseek calls with a whence value of SEEK_END. [ISO 9899:1999 7.19.9.2 paragraph 3]
But undefined behavior does not mean "bad behavior"; it is simply behavior for which the ISO C standard gives no definition. Not all undefined behaviors are the same.
Some undefined behaviors are areas in the language where meaningful extensions can be provided. The platform fills the gap by defining a behavior.
Providing a working fseek
which can seek from SEEK_END
is an example of an extension in place of undefined behavior. It is possible to confirm whether or not a given platform supports fseek
from SEEK_END
, and if this is provisioned, then it is fine to use it.
Providing a separate function like lseek
is also an extension in place of undefined behavior (the undefined behavior of calling a function which is not in ISO C and not defined in the C program). It is fine to use that, if available.
Note that those platforms which have functions like the POSIX lseek
will also likely have an ISO C fseek
which works from SEEK_END
. Also note that on platforms where fseek
on a binary file cannot seek from SEEK_END
, the likely reason is that this is impossible to do (no API can be provided to do it and that is why the C library function fseek
is not able to support it).
So, if fseek
does provide the desired behavior on the given platform, then nothing has to be done to the program; it is a waste of effort to change it to use that platform's special function. On the other hand, if fseek
does not provide the behavior, then likely nothing does, anyway.
Note that even including a nonstandard header which is not in the program is undefined behavior. (By omission of the definition of behavior.) For instance if the following appears in a C program:
#include <unistd.h>
the behavior is not defined after that. [See References below.] The behavior of the preprocessing directive #include
is defined, of course. But this creates two possibilities: either the header <unistd.h>
does not exist, in which case a diagnostic is required. Or the header does exist. But in that case, the contents are not known (as far as ISO C is concerned; no such header is documented for the Library). In this case, the include directive brings in an unknown chunk of code, incorporating it into the translation unit. It is impossible to define the behavior of an unknown chunk of code.
#include <platform-specific-header.h>
is one of the escape hatches in the language for doing anything whatsoever on a given platform.
In point form:
<unistd.h>
. An undefined ISO C program can be a well defined POSIX C program.References:
#include <pascal.h>
can bring in a pascal keyword for linkage.] http://groups.google.com/group/comp.lang.c/msg/e2762cfa9888d5c6?dmode=sourceUpvotes: -2
Reputation: 234354
The article claims fseek(stream, 0, SEEK_END)
is undefined behaviour by citing an out-of-context footnote.
The footnote appears in text dealing with wide-oriented streams, which are streams that the first operation that is performed on them is an operation on wide-characters.
This undefined behaviour stems from the combination of two paragraphs. First §7.19.2/5 says that:
— Binary wide-oriented streams have the file-positioning restrictions ascribed to both text and binary streams.
And the restrictions for file-positioning with text streams (§7.19.9.2/4) are:
For a text stream, either
offset
shall be zero, oroffset
shall be a value returned by an earlier successful call to theftell
function on a stream associated with the same file andwhence
shall beSEEK_SET
.
This makes fseek(stream, 0, SEEK_END)
undefined behaviour for wide-oriented streams. There is no such rule like §7.19.2/5 for byte-oriented streams.
Furthermore, when the standard says:
A binary stream need not meaningfully support
fseek
calls with awhence
value ofSEEK_END
.
It doesn't mean it's undefined behaviour to do so. But if the stream supports it, it's ok.
Apparently this exists to allow binary files can have coarse size granularity, i.e. for the size to be a number of disk sectors rather than a number of bytes, and as such allows for an unspecified number of zeros to magically appear at the end of binary files. SEEK_END
cannot be meaningfully supported in this case. Other examples include pipes or infinite files like /dev/zero
. However, the C standard provides no way to distinguish between such cases, so you're stuck with system-dependent calls if you want to consider that.
Upvotes: 7
Reputation: 123458
You can't always avoid writing platform-specific code, especially when you have to deal with things that are a function of the platform. File sizes are a function of the file system, so as a rule I'd use the native filesystem API to get that information over the fseek/ftell dance. I'd create my own generic wrapper around it, so as to not pollute application logic with platform-specific details and make the code easier to port.
Upvotes: 2
Reputation: 59987
Use fstat - requires the file descriptor - can get that from fileno from the FILE*
- Hence the size is in your grasp along with other details.
i.e.
fstat(fileno(filePointer), &buf);
Where filePointer
is the FILE *
and
buf
is
struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* inode number */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of hard links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device ID (if special file) */
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for file system I/O */
blkcnt_t st_blocks; /* number of 512B blocks allocated */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last status change */
};
Upvotes: 4
Reputation: 1872
different OS's provide different apis for this. For example in windows we have:
GetFileAttributes()
In MAC we have:
[[[NSFileManager defaultManager] attributesOfItemAtPath:someFilePath error:nil] fileSize];
But raw method is only by fread and fseek only: How can I get a file's size in C?
Upvotes: 2
Reputation: 224844
In standard C, the fseek
/ftell
dance is pretty much the only game in town. Anything else you'd do depends at least in some way on the specific environment your program runs in. Unfortunately said dance also has its problems as described in the articles you've linked.
I guess you could always read everything out of the file until EOF and keep track along the way - with fread()
for example.
Upvotes: 15