Reputation: 155
I have what I hope is an easy question. I'm trying to use iTextSharp to modify some PDF files, however it seems that the XMP metadata that iTextSharp puts at the end of the files is ruining the layout of the PDF files (and I'm not very conversant in the PDF format to understand at all why).
You can see from the two images above that the document appears to have been rotated. From looking at the PDF files as binary differences however, the only thing different appears to be some XMP metadata at the end of the files
I've tried opening the files in several PDF viewers (Sumatra PDF, Edge Browser and Adobe Acrobat) and all show the same weirdness.
I guess I have two questions: a) How can the PDF file be so altered from just having XMP meteadata at the end of the file? b) How can I make iTextSharp not produce this output? (iTextSharp only seems to do this when I Add/Edit content, and not if I just strip out Javascript or similar)
<EDIT 1>
The code that I'm using for the iTextSharp is the PdfContentStreamEditor (verbatim) from the post here: https://stackoverflow.com/a/35915789/2535822
</EDIT 1>
<EDIT 2>
Ok.. it seems that it's not the XMP Metadata. I got rid of that by using:
pdfStamper.XmpMetadata = new byte[0];
However there is still a bunch of extra data placed at the end of the file
2 0 obj
<</Producer(PDFCreator 2.5.2.5233; modified using iTextSharp’ 5.5.13 ©2000-2018 iText Group NV \(AGPL-version\))/CreationDate(D:20171206173510+10'30')/ModDate(D:20180325144710+11'00')/Title(þÿ
endobj
404 0 obj
<</Length 0/Type/Metadata/Subtype/XML>>stream
endstream
endobj
405 0 obj
<</Length 3638/Filter/FlateDecode>>stream
xœÍZmÅ/6ÒZ2ÁÆ€
....
</EDIT 2>
Upvotes: 1
Views: 1168
Reputation: 582
I can answer your second question. The metadata you are trying to remove is not supposed to be removed. The DLL of the AGPL version that you are using will add that metadata, no matter what you do with code. You will not be able to remove it with iText as it is a direct violation of their licence terms. Please refer to : https://itextpdf.com/AGPL
You must prominently mention iText and include the iText copyright and AGPL license in output file metadata.
Upvotes: 1
Reputation: 95918
You have indeed found a bug in the PdfContentStreamEditor
I used in this answer while the other issue requires one to know how to disable a special feature or quirk (depending on the circumstances) of iText.
This part deals with the rotation of content in the sample document PHA-Pro 8 - File.pdf
provided by the OP.
As you already have seen yourself, the rotation issue appears connected with the fact that the page rotation of the page in question is not 0.
Indeed, the iText PdfStamper
has a feature which in case of rotated pages automatically rotates additions one applies to the OverContent
or UnderContent
. This feature can be quite handy if you want to add upright content to the page without having to apply rotation yourself to make it upright. In case of the PdfContentStreamEditor
, though, all coordinates we receive from the existing content already have the applicable rotation factored in.
Thus, we need to disable this feature. One can do so using the PdfStamper
property RotateContents
:
using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write), (char)0, true))
{
pdfStamper.RotateContents = false;
PdfContentStreamEditor editor = new PdfContentStreamEditor();
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
editor.EditPage(pdfStamper, i);
}
}
This part deals with the scrambling of text in the sample document AS62061-2006.pdf
provided by the OP.
You have found a bug in the PdfContentStreamEditor
. Its Write
method contains this loop:
foreach (PdfObject pdfObject in operands)
{
pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);
canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}
It should instead be
foreach (PdfObject pdfObject in operands)
{
pdfObject.ToPdf(null, canvas.InternalBuffer);
canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}
If one presents the PdfWriter
to the ToPdf
method of a PdfString
and the PdfWriter
uses encryption, the string contents are getting encrypted. But here the string is written to a stream, and in that case not the individual string must be encrypted but instead eventually the whole stream.
This applies to the PDF provided by the OP because
PdfStamper
in append mode which encrypts the additions using the same password as the original file.With the original code, the result looks like this:
With the fixed code, it looks like this:
Upvotes: 2