Reputation: 73
I wrote simple function that read whole file into a buffer.
#include <iostream>
#include <fstream>
int main()
{
std::ios_base::sync_with_stdio(0);
std::ifstream t;
t.open("C:\\Users\\sufal\\Desktop\\test.txt");
t.seekg(0, std::ios::end);
long length = t.tellg();
t.seekg(0, std::ios::beg);
std::cout << "file size: " << length << std::endl;
char* buffer = new char[length+1];
t.read(buffer, length);
t.close();
buffer[length] = 0;
std::cout << buffer << std::endl;
return 0;
}
And this is test.txt:
1
2
3
The output that the program produces looks like this:
The file size should be 5 bytes. Why my program shows wrong file size? Windows Explorer also seems to show wrong file size of 7 bytes.
Upvotes: 1
Views: 1389
Reputation: 595981
Your file is 7 bytes in size, because it uses CRLF line breaks.
1[cr][lf]
2[cr][lf]
3
But, you are opening the file in text mode, which on Windows will normalize CRLF line breaks to LF. You are allocating 7 char
s for your buffer, but read()
is outputting only 5 char
s:
1[lf]
2[lf]
3
That is why you see the extra 2 =
on the end of the print output, because you didn’t zero out the unused buffer space, so you are seeing random garbage from uninitialized memory.
To do what you are attempting, open the file in binary mode instead.
t.open("C:\\Users\\sufal\\Desktop\\test.txt", std::ios_base::binary);
See Binary and text modes on cppreference.com for more details.
Upvotes: 2
Reputation: 35891
On Windows the newline character is "\r\n"
, which consists of two bytes. So, if your file does not end with a newline, 7
is indeed its size:
1 <-- 1 byte for '1', 2 bytes for CRLF
2 <-- 1 byte for '2', 2 bytes for CRLF
3 <-- 1 byte for '3'
To read the file correctly on a byte level you need to open it in binary mode:
t.open("C:\\Users\\sufal\\Desktop\\test.txt", ios_base::binary);
(you can read about the details of this behavior in the documentation).
You can also see other options to read the whole file into a string in C++:
Upvotes: 4
Reputation: 1011
On Windows this file is indeed 7 bytes: 1
\r\n
2
\r\n
3
Windows encodes new line in two bytes - CR + LF (or \r
+ \n
in other notation).
All is correct.
Upvotes: 1