xmllmx
xmllmx

Reputation: 42353

Must I open a file in [ios::binary] mode to get its size?

#include <fstream>
#include <string>
#include <cassert>

long long GetFileSizeA(const std::string& file_path)
{
    return std::ifstream
    {
        file_path, std::ios::ate    
    }.tellg();
}

long long GetFileSizeB(const std::string& file_path)
{
    return std::ifstream
    {
        file_path, std::ios::ate | std::ios::binary 
    }.tellg();
}

int main()
{
    auto a = GetFileSizeA("~/test.log");
    auto b = GetFileSizeB("~/test.log");

    assert(a == b); // always true?
}

If the file ~/test.log contains many \r\n sequences, does the C++ standard guarantee GetFileSizeA is identical to GetFileSizeB?

Upvotes: 2

Views: 1061

Answers (2)

Leon
Leon

Reputation: 32484

There is no such guarantee by the C++ standard.

In fact, the code

std::ifstream{file_path, std::ios::ate | std::ios::binary}.tellg();

is not guaranteed to work as intended, either. The tellg() operation on file-based streams boils down through a couple of intermediate functions (std::basic_istream::tellg -> std::basic_streambuf::pubseekoff -> std::basic_filebuf::seekoff) and using the 'as if ' formulation to std::fseek(). The latter isn't required to support seeking in binary streams relative to the end position:

int fseek( std::FILE* stream, long offset, int origin );

Sets the file position indicator for the file stream stream.

If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Binary streams are not required to support SEEK_END, in particular if additional null bytes are output.

Upvotes: 1

Mats Petersson
Mats Petersson

Reputation: 129374

The standard does by no means guarantee that the two are equal (nor does the C or C++ standard state whether files contain \r\n or \n or \r as the line-ending, that is defined by the OS and/or application. The standard C library, and by extension, the C++ library, guarantees that if you read the file in text-mode, it will transform whatever actual line-endings there are, into the internal \n form). It also doesn't guarantee that it's NOT the same value always.

More importantly, you may well find that if you read some part of the file and ask "where am I", that the answer is different between if you read as a binary file or as an ascii file. If you plan on for example mapping the file into memory and processing it as a large string of characters, without translating newlines, then you need to do that as a binary file.

Upvotes: 1

Related Questions