Reputation: 11
I'm trying to create a PDF document with better accessibility. It doesn't need to pass any standards but I would like to add alt text to images. I'm able to properly tag text, but I can't seem to tag images properly when using a similar workflow. I referenced this post and its related post to create my current code.
My code so far:
public static void main(String[] args) throws IOException {
int mcidCounter = 0;
int structParentCounter = 0;
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
page.setStructParents(structParentCounter);
PDPageContentStream contentStream = null;
try {
contentStream = new PDPageContentStream(document, page);
} catch (IOException e) {
throw new RuntimeException(e);
}
PDImageXObject pdImage = PDImageXObject.createFromFile("image_file", document);
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDStructureTreeRoot structureTreeRoot = new PDStructureTreeRoot();
catalog.setStructureTreeRoot(structureTreeRoot);
PDViewerPreferences prefs = new PDViewerPreferences(new COSDictionary());
prefs.setDisplayDocTitle(true);
catalog.setViewerPreferences(prefs);
PDMarkInfo markInfo = new PDMarkInfo();
markInfo.setMarked(true);
catalog.setMarkInfo(markInfo);
PDStructureElement documentElement = new PDStructureElement(StandardStructureTypes.DOCUMENT, structureTreeRoot);
structureTreeRoot.appendKid(documentElement);
PDStructureElement paragraphElement = new PDStructureElement(StandardStructureTypes.P, documentElement);
paragraphElement.setPage(page);
documentElement.appendKid(paragraphElement);
COSDictionary markedContentDictionary = new COSDictionary();
markedContentDictionary.setInt(COSName.MCID, mcidCounter);
PDMarkedContentReference mcr = new PDMarkedContentReference();
mcr.setMCID(mcidCounter);
paragraphElement.appendKid(mcr);
contentStream.beginMarkedContent(COSName.P, PDPropertyList.create(markedContentDictionary));
contentStream.setFont(new PDType1Font(Standard14Fonts.FontName.HELVETICA_BOLD), 12);
contentStream.beginText();
contentStream.newLineAtOffset(50, 700);
contentStream.showText("Document Title");
contentStream.endText();
contentStream.endMarkedContent();
PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement);
figureElement.setPage(page);
figureElement.setAlternateDescription("Alternate Image Description");
documentElement.appendKid(figureElement);
COSDictionary markedContentDictionary3 = new COSDictionary();
markedContentDictionary3.setInt(COSName.MCID, mcidCounter + 2);
markedContentDictionary3.setString(COSName.ALT, "Alternate Image Description");
PDMarkedContentReference mcr3 = new PDMarkedContentReference();
mcr3.setMCID(mcidCounter + 2);
figureElement.appendKid(mcr3);
contentStream.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(markedContentDictionary3));
contentStream.drawImage(pdImage, 50, 0);
contentStream.endMarkedContent();
contentStream.close();
COSDictionary parentTreeRoot = new COSDictionary();
PDNumberTreeNode parentTree = new PDNumberTreeNode(parentTreeRoot, COSBase.class);
Map<Integer, COSObjectable> parentTreeMap = new HashMap<>();
parentTreeMap.put(structParentCounter, paragraphElement);
parentTree.setNumbers(parentTreeMap);
structureTreeRoot.setParentTree(parentTree);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
document.save(outputStream);
byte[] pdfBytes = outputStream.toByteArray();
document.close();
Path actualPath = Path.of("test.pdf");
Files.write(actualPath, pdfBytes, StandardOpenOption.CREATE, StandardOpenOption.WRITE);
}
This is the internal structure of the PDF
.
I can't see any Structure tree even though I link it to the document catalog.
Is there something missing in the code or am I misunderstanding what I should be seeing?
Upvotes: 1
Views: 36