Reputation: 33
I'm working on a PDF accessibility assignment, which is to add alternative text in a tagged PDF. I got the sample code for the same at: Add alternative text for an image in tagged pdf (PDF/UA) using iText
Very much excited about that my task is going to end in a very short time, without much R&D.
Created a Java project based on the code, and when I executed it, it worked perfectly for the input PDF used in iText.
Unfortunately, the same source code is not working with PDFs tagged using Acrobat.
Sample Inputs: iText PDF: no_alt_attribute.pdf & My PDF: SARO_Sample_v1.7.pdf
Issue:
// This line works and returns RootElement
PdfDictionary structTreeRoot = catalog.getAsDict(PdfName.STRUCTTREEROOT);
// --> This line always returns NULL,
// Instead of returning the child elements of RootElement
PdfArray kids = structTreeRoot.getAsArray(PdfName.K);
// --> As per the structure Kids are present
Compared the structure of both PDFs and the following are my observations:
/T:StructElem
whereas that's not found in MY-PDF Even re-tagging doesn't help. Tag Tree StructureVerified with various tagged PDFs available with us and all are similar (without /T:StructElem
). These PDFs are validated and have passed accessibility compliance.
Need some thoughts on how to make this source code work with the PDFs we have. Alternatively, I need a way to ADD the missing /T:StructElem
automatically in the PDFs while tagging in Acrobat.
Any help will be much appreciated!
Please do let me know if any further information is needed.
Note: I'm still not sure adding this /T:StructElem
will work, since the PDFs were passed in PAC.
If this is really an issue, then those PDFs wont be passed the validations, right? But this is the only difference I found between those two PDFs.
PS: The Acrobat version I'm using is "Adobe Acrobat (Pro) DC."
-- Thanks,
SaRaVaNaN
Upvotes: 3
Views: 524
Reputation: 95918
Bruno's code in the referenced answer does not walk the whole structure tree because he did not implement all cases of the K contents. The structure element K entry is specified like this:
The children of this structure element. The value of this entry may be one of the following objects or an array consisting of one or more of the following objects in any combination: [...]
(ISO 32000-2, Table 355 — Entries in a structure element dictionary)
Bruno's code, though, always assumes the value to be an array:
PdfArray kids = element.getAsArray(PdfName.K);
(Most likely he implemented that code with just the structure tree of the PDF in question there in mind.)
Thus, replace
PdfArray kids = element.getAsArray(PdfName.K);
if (kids == null) return;
for (int i = 0; i < kids.size(); i++)
manipulate(kids.getAsDict(i));
by something like
PdfObject kid = element.getDirectObject(PdfName.K);
if (kid instanceof PdfDictionary) {
manipulate((PdfDictionary)kid);
} else if (kid instanceof PdfArray) {
PdfArray kids = (PdfArray)kid;
for (int i = 0; i < kids.size(); i++)
manipulate(kids.getAsDict(i));
}
As you did not share an example document, I could not test the code. If there are problems, please share an example PDF.
Upvotes: 1