Reputation: 24554
I'm using the following code to read the content of a PDF file:
string document;
FILE * f;
f = fopen ( path , "rb");
unsigned char buffer[1024];
while(!feof(f)){
int bytes = fread(buffer,1,1024,f);
for(int i = 0; i < bytes; i++){
document += buffer[i];
cout << buffer[i];
}
}
fclose ( f );
The problem is, that the chars are not the same as when I open the file in a text editor. For example this file files.flashfan.ch/file.png
results in this output: files.flashfan.ch/output.png
How can I read the file, so that the chars are exactly the same as in the editor? I want to parse PDF files, but without the original chars I cant to this. I've testet the code with this file (its not a PDF file, just a part of one, so you can't display it):
Thanks for your help!
Upvotes: 4
Views: 8218
Reputation: 107
Try using a hex editor. Sometimes programs like notepad can't read normal code, so you would have to view it with a hex editor. I personally recommend ghex.
Upvotes: 0
Reputation: 25759
It is a binary file, it makes no sense to open it in a text editor. Use a hex editor instead (like XVI32)
...and do the printing like this:
fprintf("%#x ", buffer[i]);
Upvotes: 1
Reputation: 55635
I don't see any errors in the way you read the file (the code actually works on my Linux box when I redirect the output to a file). Probably the issue is in the control characters that mess up with the console. Try to output to a file and compare with the input.
Upvotes: 3