Reputation: 2170
I'm working on some code that concatenates PDF files using iTextSharp. I'm having a problem with a particular PDF that contains some read-only fields and a field that is editable (I believe they're AcroFields). In the output file all of the fields are editable.
Here is the code that I use (I've simplified it to read only one PDF):
public static void Concat(string outputFilePath, string inputFilePath)
{
using (var document = new Document())
{
using (var fileStream = new FileStream(outputFilePath, FileMode.Create, FileAccess.ReadWrite))
using (var copier = new PdfCopy(document, fileStream))
{
copier.SetMergeFields();
document.Open();
var reader = new PdfReader(inputFilePath);
copier.AddDocument(reader);
copier.AddJavaScript(reader.JavaScript);
copier.Close();
}
document.Close();
}
}
Any ideas on how to preserve the attributes of the fields?
Upvotes: 1
Views: 931
Reputation: 96064
It looks like iText and Adobe Reader interpret the form field structure differently. E.g. look at this parent field with one child:
(Object 24 is referenced from the AcroForm dictionary Fields array. Object 130 is referenced from the Page dictionary ANNOTS array.)
So we have two field objects named PageDataCollection1[0].txtCity
, the objects 24 and 130, the widget annotation being merged into the latter.
iText considers the terminal field object (object 130) to be completely in charge, using its Ff value 0
which among other things means not read-only.
Adobe Reader, on the other hand, considers the terminal field object (object 130) only to be partially in charge, using its DA value but not its Ff value. Instead the parent Ff value 1
is used which among other things means read-only.
In the course of copying the document pages, the hierarchies are flattened making the different interpretation visible.
Ad hoc I would say the behavior of iText is correct here.
The behavior of Adobe Reader might be justified with this section from the specification ISO 32000-1:
It is possible for different field dictionaries to have the same fully qualified field name if they are descendants of a common ancestor with that name and have no partial field names (T entries) of their own. Such field dictionaries are different representations of the same underlying field; they should differ only in properties that specify their visual appearance. In particular, field dictionaries with the same fully qualified field name shall have the same field type (FT), value (V), and default value (DV).
(section 12.7.3.2 Field Names)
Maybe Adobe Reader tries to enforce that different representations of the same field only differ in properties that specify their visual appearance, by ignoring other properties in descendant fields without partial field names.
As there are no different representations of the field, though, there is no need for this measure here.
There is an alternative interpretation of the object structure here, @rhens proposed
There aren't 2 fields with the same name: object 24 is the field dictionary, object 130 is the widget annotation.
IMO this interpretation does not match the PDF specification even though it would explain the behavior of Adobe Reader.
While indeed the Kids array of a form field may contain either child fields or widgets, the object 130 in my opinion has to be considered a field (which has its own widget merged into itself) rather than a widget of field object 24.
To check whether some kid dictionary object is a child field or merely a widget, it does not suffice to find widget-specific entries in the kid: such entries can also be in a child field which has its single widget merged into itself. Thus, one instead has to check for field-specific entries in the kid.
In the case at hand the kid object 130 does have field-specific entries (foremost the field type FT but also the field flags Ff) and, therefore, should be considered a child field.
That all been said, it indeed is possible that Adobe does consider that object a mere widget (which, as mentioned above, would explain the behavior). This interpretation would not be inspired by the specification, though, as explained above. But it might be inspired by a non-negligible amount of documents from the wild which erroneously have additional field-specific entries in their plain widgets and require this interpretation to be displayed as designed.
Upvotes: 3