Van Coding
Van Coding

Reputation: 24554

C++ Reading a PDF file

I'm using the following code to read the content of a PDF file:

string document;
FILE * f;
f = fopen ( path , "rb");
unsigned char buffer[1024];
while(!feof(f)){   
    int bytes = fread(buffer,1,1024,f);
    for(int i = 0; i < bytes; i++){
        document += buffer[i];
        cout << buffer[i];
    }
}
fclose ( f );

The problem is, that the chars are not the same as when I open the file in a text editor. For example this file files.flashfan.ch/file.png

results in this output: files.flashfan.ch/output.png

How can I read the file, so that the chars are exactly the same as in the editor? I want to parse PDF files, but without the original chars I cant to this. I've testet the code with this file (its not a PDF file, just a part of one, so you can't display it):

PDF Head.pdf

Thanks for your help!

Upvotes: 4

Views: 8218

Answers (3)

Pizearke
Pizearke

Reputation: 107

Try using a hex editor. Sometimes programs like notepad can't read normal code, so you would have to view it with a hex editor. I personally recommend ghex.

Upvotes: 0

Johan Kotlinski
Johan Kotlinski

Reputation: 25759

It is a binary file, it makes no sense to open it in a text editor. Use a hex editor instead (like XVI32)

...and do the printing like this:

fprintf("%#x ", buffer[i]);

Upvotes: 1

vitaut
vitaut

Reputation: 55635

I don't see any errors in the way you read the file (the code actually works on my Linux box when I redirect the output to a file). Probably the issue is in the control characters that mess up with the console. Try to output to a file and compare with the input.

Upvotes: 3

Related Questions