Pearl
Pearl

Reputation: 9415

How to convert PdfContentBytes to Array of Bytes

I'm using iTextSharp DLLin asp.net.

PdfReader reader = new PdfReader(path);
        //create footer
        MemoryStream outStream = new MemoryStream();
        PdfStamper textStamp = new PdfStamper(reader, outStream);
        BaseFont baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, Encoding.ASCII.EncodingName, false);
        for (int i = 1; i <= reader.NumberOfPages; i++)
        {
            PdfContentByte pdfPageContents = textStamp.GetOverContent(i);            
//How to convert the PdfContentByte  to array of bytes here?
}

I want to convert each page of the PDF to JPEG. How to convert the PdfContentByte to array of bytes here?

Upvotes: 1

Views: 3005

Answers (3)

Tim Baas
Tim Baas

Reputation: 6185

You can get a byte[] of a PdfContentByte as follows:

pdfPageContents.getInternalBuffer().toByteArray();

Upvotes: 0

Chris Haas
Chris Haas

Reputation: 55417

I don't think your plan is going to work. Not everything that looks like it lives on a "page" actually lives on a page, some things live in a global shared location. So extracting a page's bytes would give you a corrupt document. You could extract every page in a PDF to separate files which would bring over these shared resources but that still is in the PDF format. If you have already written a PDF-to-JPEG routine then maybe you're OK. If you haven't, then iTextSharp won't be able to help you.

iTextSharp doesn't (currently) "know" what a PDF "looks" like, it only knows the contents of the PDF. It "knows" that a run of text exists but it doesn't "know" how that should be rendered visually. It "knows" that a PDF might have two images but doesn't "know" or even care if they overlap, once again that's the renderer's problem.

Once again, if you've written a PDF-to-JPEG routine then disregard all that I'm saying. But the bytes of a PDF have nothing in common with the bytes of JPEG. Although a PDF may contain a JPEG it can also contain many other types of binary data. And that data is probably compressed inside of a stream, too.

Now, if you're looking to just extract images from a PDF, that is something that iTextSharp can help you with.

Upvotes: 2

Leo Chapiro
Leo Chapiro

Reputation: 13984

Try this:

PdfReader reader = new PdfReader(path);
MemoryStream outStream = new MemoryStream();
PdfStamper textStamp = new PdfStamper(reader, outStream);
byte[] content = outStream.ToArray();

Upvotes: 2

Related Questions