how to store pdf file as binary

Question

I am working on a project that makes use of a PDF template.

I was wondering is is possible to store a PDF file as binary, then recreate the PDF at a later stage from the binary data?

I was hoping this was possible to help save space, rather than having to attach a PDF file with the project, it would be much more sufficient to store it as data.

any insight into a solution would be very much appreciated.

Thanks in advance

Mats Petersson · Accepted Answer

I doubt very much that you'd save ANY space on that, since you will need some code to "unstore" the data that is the PDF - it may not take up MUCH space, but it's likely to be SOME bytes in your executable. Add to that that a PDF is already compressed, so you won't get much gain from any form of compression or some such that you may think of using.

A simple experiment as to "how much smaller does something get" is to pack it in a zip-file. If it turns out that it's the same size or slightly larger, then it is already compressed.

Using a "binary dump" program of some sort (probably will need to either write some code, or cobble together a script, or both), you can have a large binary blob in a program by using something like this:

Data bytes (in hex - just a sample, not a PDF):

 01 3E 78 28 41 FF EE AA ...

Data in C/C++ style:

 unsigned char data[] =
  "\001>x(A\377\356\252";

Long lines can/will have to be split, like this:L

 unsigned char data[] =
  "\001>x(A\377"
  "\356\252";

You may find that this doesn't work because the compiler has a maximum size for strings - most modern compilers set that limit quite high, but the standard doesn't (from memory, it's about 8KB as the size of a string constant), and if you compile with high warning levels, the compiler may warn for "This string may not work for all compilers" or something like that.

Depending on the mix of values, it may be better as:

 unsigned char data[] = 
 { 1, 62, 120, 40, 255, 238, 170 };

(From a soruce size perspective, the spaces are not required, so the code can be made a fair bit smaller - at least 20% - by removing those. I've kept them in for readability)

You'll have to experiment to find which is more effective. But no matter which, it will take up some more space than the original text. If it's largely text, not very much larger. If it's "truly binary data", it will be noticeably larger.

A quick google found this: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka10382.html which appears to link to a program which does the "binary to C data" translation.

Code to extract binary data into an unsigned char array:

#include 
#include 
#include 
#include 

void usage()
{
    std::cerr << "bintoc infile outfile" << std::endl;
}

int main(int argc, char **argv)
{
    if (argc != 3)
    {
    std::cerr << "Incorrect number of arguments..." << std::endl;
    usage();
    exit(1);
    }

    std::ifstream in(argv[1], std::ios::binary);
    std::ofstream out(argv[2]);


    if (!in)
    {
    std::cerr << "Could not open " << argv[1] << std::endl;
    exit(1);
    }

    if (!out)
    {
    std::cerr << "Could not open " << argv[1] << std::endl;
    exit(1);
    }

    unsigned char buffer[16];

    out << "unsigned char data[] = " << std::endl << "{" << std::endl;;
    while(in.read(reinterpret_cast(buffer), sizeof(buffer)))
    {
    for(int i = 0; i < in.gcount(); i++)
    {
        out <<  std::setw(3) << static_cast(buffer[i]) << ", ";
    }
    out <<  std::endl;
    }
    out << "};" << std::endl;

    return 0;
}

how to store pdf file as binary

Answers (1)

Related Questions