garlicDoge
garlicDoge

Reputation: 197

How to return the (total) pagecount of external PDF files via XSL

Is it possible to return the total page count of an external PDF file via XSL? Does the AntennaHouse Formatter have an equivalent extention?

Thanks in advance!

Upvotes: 0

Views: 997

Answers (2)

Toshihiko Makita
Toshihiko Makita

Reputation: 1304

If you are using Java based XSLT processor which allows external function call (such as Saxon PE or EE), then Apache PDFBox will help you.

PDFBox: https://pdfbox.apache.org/

PDFBox’s PDDocument class has the method that returns page count of the target PDF. So you can get page count by following step:

  1. Write Java class and static method.
  2. Call it from XSLT styleshhet.

[Java sample code]

package com.acme.pdfutil;
import java.io.File;
import org.apache.pdfbox.pdmodel.PDDocument;
public class pdfDocument {
    /**
     * Get the page count of specified PDF file.
     * @param filePath 
     * @return Page count
     */
    public static int getPageCount(String filePath){
        File pdfFile = null;
        PDDocument pdfDoc = null;
        int pageCount = -1;
        try {
            pdfFile = new File(filePath);
            pdfDoc = PDDocument.load(pdfFile);
            pageCount = pdfDoc.getNumberOfPages();
        }
        catch (Exception e) {
            System.out.println("[getPageCount] " + e.getMessage());
        }
        finally {
            if (pdfDoc != null){
                try{
                    pdfDoc.close();
                }
                catch (Exception e) {
                    ;
                }
            }
        }
        return pageCount;
    }
}

[XSLT stylesheet]

<xsl:stylesheet version="2.0" 
 xmlns:fo="http://www.w3.org/1999/XSL/Format" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:acmejava="java:com.acme.pdfutil.pdfDocument"
>
…
<!-- Call external function -->
<xsl:variable name=”pdfPageCount” as="xs:integer" select="acmejava:getPageCount($pdfPath)"/>
…

Upvotes: 2

Tony Graham
Tony Graham

Reputation: 8068

Not out of the box, no. Ways to do it would include:

  • Use a command line tool such as pdftk (https://www.pdflabs.com/tools/pdftk-server/) that can report the number of pages. Before running the XSLT to create the FO, you could run the tool on the PDF and save the result to a file, and you would then read the file during the XSLT processing.
  • Less reliably, you could use grep, etc., on the PDF and save the output of that to a file to be read. See, e.g., http://www.unix.com/printthread.php?t=55661&pp=40
  • If you think that all your PDFs are readable as 'unparsed text' by XSLT, then you could read the PDF using unparsed-text() then use XSLT's regular expression ability to find the right string(s).
  • You could use the XSLT extensions from the Print and Page Layout Community Group (https://www.w3.org/community/ppl/wiki/XSLTExtensions) from within your XSLT to get the area tree from an FO file that just contains your external PDF and count the number of pages in that.
  • Before running your XSLT, you could run AHPDFXML from Antenna House (see https://www.antennahouse.com/antenna1/ahpdfxml-conversion-library/) to get an XML representation of your PDF, then your XSLT could count the number of pages in that XML.

Upvotes: 2

Related Questions