satish john
satish john

Reputation: 226

Pdfbox how to extract font type and style from pdf

How to retrieve font type style attributes from pdf using pdfbox

Upvotes: 2

Views: 4600

Answers (1)

matthiasboesinger
matthiasboesinger

Reputation: 458

If you want to get the font of a single character in the pdf document, you can call textPosition.getFont().getFontDescriptor().getFontName(), where textPosition is a instance of the class TextPosition.

All characters of a PDF document are related to TextPosition objects.

You can get the TextPosition objects of a PDF document by overriding the processTextPosition(TextPosition t) method of PDFTextStripper or with the getCharactersByArticle() method of PDFTextStripper.

i.e. for latter - extend the PDFStripper class like this:

public class MyPDFTextStripper extends PDFTextStripper {

    public MyPDFTextStripper() throws IOException {
        super();
    }

    public Vector<List<TextPosition>> myGetCharactersByArticle() {
        return getCharactersByArticle();
    }
}

... to get the list of TextPositions for a single page use:

MyPDFTextStripper stripper = new MyPDFTextStripper();
PDDocument doc = PDDocument.load(new File(filename));
stripper.setStartPage(pageNr+1);
stripper.setEndPage(pageNr+1);
stripper.getText(doc);
Vector<List<TextPosition>> list = stripper.myGetCharactersByArticle();

... and finally to get the font for a single character just type:

textPosition.getFont().getFontDescriptor().getFontName()

Upvotes: 1

Related Questions