Tagging with itext7

Question

I was working on tags for accessibility with iText7 and I want a particular structure for the tags. I see itext7 has library methods but I couldnt find the usage anywhere on the itext site or any other developer forums. I want the structure to be like this.

I see TagTreePointer class and sub methods which are used for tagging but doesnot know how to implement it.

I was trying sample code for achieving the above functionality but found out some inconsistency.

Document document = new Document(pdf);
        pdf.setTagged();
        pdf.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
        pdf.getCatalog().setLang(new PdfString("en-US"));
        PdfDocumentInfo info = pdf.getDocumentInfo();
        info.setTitle("English pangram");
        Paragraph p = new Paragraph("Tested");
        p.getAccessibilityProperties().setRole("H");
        Paragraph p2 = new Paragraph("Child H1");
        p2.getAccessibilityProperties().setRole("H1");
        document.add(p.add(p2.add(new Paragraph("Testing ChildChild"))));
        document.close();

I am adding paragraph to header paragraph, I see that added paragraphs are appended with each other. What is the right way to use it?

Bruno Lowagie · Accepted Answer

I have four examples for you.

Example 1 is the simple one:

public void createPdf(String dest) throws IOException {
    PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
    pdf.setTagged();
    pdf.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
    pdf.getCatalog().setLang(new PdfString("en-US"));
    PdfDocumentInfo info = pdf.getDocumentInfo();
    info.setTitle("Testing tags");
    Document document = new Document(pdf);
    Paragraph p = new Paragraph("Tested");
    p.getAccessibilityProperties().setRole("H");
    Paragraph p2 = new Paragraph("Child H1");
    p2.getAccessibilityProperties().setRole("H1");
    document.add(p).add(p2).add(new Paragraph("Testing ChildChild"));
    document.close();
}

This results in the following PDF:

The odd thing about the structure is that you are mixing header tags. When you use H, I don't expect you to use H1. I would expect you to either use H and only one level of headers. If you need more levels, I'd expect you to use H1, H2,...

You also notice that your comment doesn't stand the test of reality. You wrote:

I changed the code with document.add(p).add(p2).add(new Paragraph("Testing ChildChild")); and I see the new paragraphs which I add doesnot show on new line. I want each paragraph to be on new line.

However, if you look at the screen shot, you clearly see that every paragraph starts on a new line. Please avoid posting comments that can easily be proven false. That might result in people helping you out.

If you want more structure layers, you can introduce a Div:

public void createPdf(String dest) throws IOException {
    PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
    pdf.setTagged();
    pdf.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
    pdf.getCatalog().setLang(new PdfString("en-US"));
    PdfDocumentInfo info = pdf.getDocumentInfo();
    info.setTitle("Testing tags");
    Document document = new Document(pdf);
    Paragraph p = new Paragraph("Tested");
    p.getAccessibilityProperties().setRole("H");
    Div divH = new Div().add(p);
    Paragraph p2 = new Paragraph("Child H1");
    p2.getAccessibilityProperties().setRole("H1");
    Div divH1 = new Div().add(p2);
    divH1.add(new Paragraph("Testing ChildChild"));
    divH.add(divH1);
    document.add(divH);
    document.close();
}

This result looks like this:

That looks more convoluted, especially for an example as simple as this, but if your document is bigger, this extra structure might be helpful.

In my comment, I referred to HTML because Tagging in PDF mimics tagging in HTML. When iText was rewritten from scratch, it was rewritten with HTML in mind.

I know that you wrote:

We are not creating any HTML tags. Instead we are getting the data from DB and inserting into paragraphs.

I think you missed my point there. I merely wanted to explain that, no matter how you created your tagged PDF, it's always good to keep in mind how content could be tagged in HTML.

Take for instance:

Introduction
TOC
List
Appendix
Heading
Description

Now run this code:

 */
public void createPdf(String baseUri, String src, String dest) throws IOException {
    PdfWriter writer = new PdfWriter(dest);
    PdfDocument pdf = new PdfDocument(writer);
    pdf.setTagged();
    HtmlConverter.convertToPdf(new FileInputStream(src), pdf);
}

The result will be:

That's very similar to the first example.

Now if we add some extra structure like this:


    Introduction
    
        TOC
        List
    


    Appendix
    
        Heading
        Description

We get this result (using the same code):

This structure looks more like the second example.

I think you misunderstood my comment about HTML. I use HTML to model my code. It's much easier to tweak HTML, convert to PDF, and look at the resulting tag structure, than it is to constantly change my Java code, compile and run that code, and then look at the result.

I was suggesting that you experiment with HTML even if your application doesn't need HTML. Experimenting with HTML helps you make decisions about the structure.

Tagging with itext7

Answers (1)

Related Questions