Reputation: 291
I have the following simple code, that reads contents of a text file into array of chars:
const char* name = "test.txt";
std::cout << "Loading file " << name << std::endl;
std::ifstream file;
file.open(name);
file.seekg (0, std::ios::end);
int length = file.tellg();
std::cout << "Size: " << length << " bytes" << std::endl;
file.seekg (0, std::ios::beg);
char* buffer = new char[length];
file.read(buffer,length);
file.close();
std::cout.write(buffer,length);
However, it seems ifstream reads wrong number of chars from the file: 1 additional char for each line. I searched through the web and it looks like in win7 text files have carriage return symbol (\r) in addition to newline (\n) in the end of each line. However, the stream somehow does not see these \r, but still uses the original number of symbols in the file, reading additional bytes from beyond the end of the file. Is it possible to somehow solve this problem?
If it helps: I use MinGW compiler and Windows 7 64bit.
Upvotes: 2
Views: 2957
Reputation: 153899
You're starting from some very erroneous (but widespread) opinions.
file.tellg()
doesn't return an int
; it returns an implementation
defined object of type streampos
, which must be a class type, and may
or may not be convertible into an integral type. And if it is
convertable into an integral type (and I don't know of an implementation
where it isn't, even if it is not required), there is no guarantee that
the resulting integer represents anything more than a magic cookie which
would allow reseeking to the same position.
In practice, this is probably not a big issue on modern machines: both
Unix and Windows return the offset in bytes from the start of the file.
In the case of Unix, this works fine, because the mapping of the
internal representation to the external one is one to one. In the case
of Windows, there is a remapping of line endings: in a text file, a line
ending is a two byte sequence of 0x0D, 0x0A, which becomes, when read,
the single char '\n'
. And streampos
(converted to an integral type)
gives the offset in bytes to where you have to seek in the file, and not
the number of char you have to read to get to that position. For things
like what you seem to be doing, this is not a problem; the allocated
buffer may be a little larger than necessary, but it will never be too
small.
Be aware that this may not be true on mainframes. Historically, at
least, mainframes used block oriented files, and the integral value of a
streampos
could easily be something broken up into fields, with a
certain number of bits for the block number, and other bits for the byte
offset in the block. Depending on how these are laid out in the word,
a buffer allocated as you do could easily be several orders of magnitude
too big, or if the offset is placed on the high order bits, too small.
The only reliable way of getting the exact size of buffer you need is system dependent, and on some systems (including Windows), there may be no other way except by reading all of the characters and counting them.
(The reason streampos
is required to be a class type is because,
historically, many older multibyte encodings had an encoding state; you
couldn't correctly decode a character without knowing what characters
preceded it. So streampos
is required to contain two different
information: the position to seek in the file, and information about
this state. I don't think that there are any state dependent multibyte
encodings in wide use today, however.)
Upvotes: 1
Reputation: 500167
You might want to open the file in binary mode:
file.open(name, ios_base::in | ios_base::binary);
Otherwise what happens is that the standard library translates every Windows newline (CR+LF) into a single \n
for you.
This means that the number of characters that you can read from the file is not the same as the size of the file. When you call read()
, it reads as many characters as it can. If it can't read the number of characters you requested, it sets the stream's failbit
.
Upvotes: 6