SaRaVaNaN
SaRaVaNaN

Reputation: 33

iText's Alt-Text adding sample code not working for PDFs tagged using Acrobat

I'm working on a PDF accessibility assignment, which is to add alternative text in a tagged PDF. I got the sample code for the same at: Add alternative text for an image in tagged pdf (PDF/UA) using iText

Very much excited about that my task is going to end in a very short time, without much R&D.

Created a Java project based on the code, and when I executed it, it worked perfectly for the input PDF used in iText.

Unfortunately, the same source code is not working with PDFs tagged using Acrobat.

Sample Inputs: iText PDF: no_alt_attribute.pdf   &   My PDF: SARO_Sample_v1.7.pdf

Issue:

 // This line works and returns RootElement
    PdfDictionary structTreeRoot = catalog.getAsDict(PdfName.STRUCTTREEROOT);

 // --> This line always returns NULL,
 //     Instead of returning the child elements of RootElement
    PdfArray kids = structTreeRoot.getAsArray(PdfName.K);
 // --> As per the structure Kids are present

Compared the structure of both PDFs and the following are my observations:

  1. Tagging Structure - exactly same in both PDFs Tagging Structure
  2. Content Structure - almost same, but a few additions are available in the PDF created by me. Content Structure
  3. Tag Tree Structure - almost same respective to Tags, but with a major difference: iText's PDF tags are marked with /T:StructElem whereas that's not found in MY-PDF Even re-tagging doesn't help. Tag Tree Structure

Verified with various tagged PDFs available with us and all are similar (without /T:StructElem). These PDFs are validated and have passed accessibility compliance.

Need some thoughts on how to make this source code work with the PDFs we have. Alternatively, I need a way to ADD the missing /T:StructElem automatically in the PDFs while tagging in Acrobat.

Any help will be much appreciated!

Please do let me know if any further information is needed.

Note: I'm still not sure adding this /T:StructElem will work, since the PDFs were passed in PAC. If this is really an issue, then those PDFs wont be passed the validations, right? But this is the only difference I found between those two PDFs.

PS: The Acrobat version I'm using is "Adobe Acrobat (Pro) DC."

-- Thanks,
SaRaVaNaN

Upvotes: 3

Views: 524

Answers (1)

mkl
mkl

Reputation: 95918

Bruno's code in the referenced answer does not walk the whole structure tree because he did not implement all cases of the K contents. The structure element K entry is specified like this:

The children of this structure element. The value of this entry may be one of the following objects or an array consisting of one or more of the following objects in any combination: [...]

(ISO 32000-2, Table 355 — Entries in a structure element dictionary)

Bruno's code, though, always assumes the value to be an array:

PdfArray kids = element.getAsArray(PdfName.K);

(Most likely he implemented that code with just the structure tree of the PDF in question there in mind.)

Thus, replace

PdfArray kids = element.getAsArray(PdfName.K);
if (kids == null) return;
for (int i = 0; i < kids.size(); i++)
    manipulate(kids.getAsDict(i));

by something like

PdfObject kid = element.getDirectObject(PdfName.K);
if (kid instanceof PdfDictionary) {
    manipulate((PdfDictionary)kid);
} else if (kid instanceof PdfArray) {
    PdfArray kids = (PdfArray)kid;
    for (int i = 0; i < kids.size(); i++)
        manipulate(kids.getAsDict(i));
}

As you did not share an example document, I could not test the code. If there are problems, please share an example PDF.

Upvotes: 1

Related Questions