Reputation: 226
How to retrieve font type style attributes from pdf using pdfbox
Upvotes: 2
Views: 4600
Reputation: 458
If you want to get the font of a single character in the pdf document, you can call textPosition.getFont().getFontDescriptor().getFontName()
, where textPosition is a instance of the class TextPosition.
All characters of a PDF document are related to TextPosition objects.
You can get the TextPosition objects of a PDF document by overriding the processTextPosition(TextPosition t)
method of PDFTextStripper or with the getCharactersByArticle()
method of PDFTextStripper.
i.e. for latter - extend the PDFStripper class like this:
public class MyPDFTextStripper extends PDFTextStripper {
public MyPDFTextStripper() throws IOException {
super();
}
public Vector<List<TextPosition>> myGetCharactersByArticle() {
return getCharactersByArticle();
}
}
... to get the list of TextPositions for a single page use:
MyPDFTextStripper stripper = new MyPDFTextStripper();
PDDocument doc = PDDocument.load(new File(filename));
stripper.setStartPage(pageNr+1);
stripper.setEndPage(pageNr+1);
stripper.getText(doc);
Vector<List<TextPosition>> list = stripper.myGetCharactersByArticle();
... and finally to get the font for a single character just type:
textPosition.getFont().getFontDescriptor().getFontName()
Upvotes: 1