user1368476
user1368476

Reputation: 3

Why does Apache POI XWPF (MS Word) not provide Font Size?

My Java application is using Apache POI XWPF to parse an MS Word docx file.

The program iterates through each XWPFRun for each XWPFParagraph within an XWPFDocument in the following was

        String fileName = "C:\\<yourFile>.docx";
        try (XWPFDocument doc = new XWPFDocument(Files.newInputStream(Paths.get(fileName)))) {
            List<XWPFParagraph> paragraphs = doc.getParagraphs();
            for (XWPFParagraph paragraph : paragraphs) {
                List<XWPFRun> runs = paragraph.getRuns();
                for (XWPFRun run : runs) {
                    System.out.print(run.getText(0));
                    System.out.print("| name " + " : " + run.getFontName());
                    System.out.print("| size " + " : " + run.getFontSize());
                    System.out.print("| sizeD " + " : " + run.getFontSizeAsDouble());
                System.out.print("| sizeC " + " : " + run.getComplexScriptFontSizeAsDouble());
                }
            }
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

Intermitently, an XWPFRun returns:

getFontSize() = -1;

getFontSizeAsDouble() = null;

getComplexScriptFontSizeAsDouble() = null;

Nevertheless getText(0), getFontName(), isBold() ... each return what can be seen in the document through a docx user client. Also, the client displays the fragment with FontSize = 12.

Im Using POI V 5.2.5.

The relevant docx tags are <sz> and <szCs>, which are subordinate to <rPr>, which is subordinate to <r>

The confusing issue is that, although in the cases where the font size is returned as -1 the <sz> and <szCs> tags are omitted, nevertheless the client displays the text with a particular size.

Not only is the text displayed, but with the size I expect given the structure of the document. The document is not structured by variation in heading type, but rather by variation in display characteristics (Font, Bolding, Italicization etc). The client APPEARS to find the last text fragment with identical display characteristics other than size, and then inherits size from that previous fragment. But Im guessing !

Please note that, as suggested in an another post, the value of document.getStyles().getDefaultRunStyle().getFontSize()) does not return the displayed font size.

Also note that the associated Style.xml has the following fragment

<w:style w:type="paragraph" w:default="1" w:styleId="Normal">
        <w:name w:val="Normal"/>
        <w:next w:val="Normal"/>
        <w:pPr/>
        <w:rPr>
            <w:sz w:val="24"/>

This DOES contain the FontSize Im after, but the only useful thing that

doc.getStyles().getDefaultParagraphStyle()

returns is getSpacingAfter() . The API refers to overriding classes that I would hope will return 12 (ie 24/2") but I have no indication what they are

Upvotes: 0

Views: 125

Answers (1)

Axel Richter
Axel Richter

Reputation: 61945

Microsoft Word has really many possibilities to determine text font size.

The simplest is formatting the text run directly. This can be got via XWPFRun.html#getFontSizeAsDouble. That returns the value representing the font size but can be null if size not set.

If font size not set, then a default font size is used.

The default font size may be stored in run properties of a paragraph style. The ID of the paragraph style can be got via XWPFParagraph.getStyleID. Then the style itself can be got via XWPFDocument.getStyles -> XWPFStyles.getStyle. But there may be a paragraph style but without run properties. If so, the paragraph style may have a linked style. We check via XWPFStyle.getLinkStyleID. If neither paragraph style nor linked style has run properties, then the default run style get used: XWPFStyles.getDefaultRunStyle.

If there is not a paragraph style assigned directly then the paragraph style with name "Normal" get used. XWPFStyles.getStyleWithName("Normal") can be used to find this style. If that style has run properties, then these get used.

Unfortunately the XWPFStyle does not provide a method to get the font size from run properties. So a method must be created. Following method Double getFontSizeAsDouble(XWPFStyle style) got adopted from XWPFDefaultRunStyle.getFontSizeAsDouble.

If neither the text run is styled directly nor a style defines the run properties, then the default run style should get used.

If all fails, the I would assume font size 11.

Complete Example:

import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

public class WordGetFontSizeOfRuns {
    
 public static Double getFontSizeAsDouble(XWPFStyle style) {
  org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRPr rPr = style.getCTStyle().getRPr();
  if (rPr != null && rPr.sizeOfSzArray() > 0) {
   java.math.BigDecimal bDSize = java.math.BigDecimal.valueOf(
    org.apache.poi.util.Units.toPoints(
     org.apache.poi.ooxml.util.POIXMLUnits.parseLength(rPr.getSzArray(0).xgetVal())))
     .divide(java.math.BigDecimal.valueOf(4), 1, java.math.RoundingMode.HALF_UP);
   return bDSize == null ? null : bDSize.doubleValue();
  }
  return null;
 }
  
 public static Double getFontSizeAsDouble(XWPFRun run) {
  Double dFontSize = run.getFontSizeAsDouble(); // direct styled run properties
  if (dFontSize == null) {
   XWPFStyles styles = run.getDocument().getStyles();
   XWPFParagraph paragraph = (XWPFParagraph)run.getParent();
   if (styles != null) { // there aer styles
    XWPFDefaultRunStyle defaultRunStyle = styles.getDefaultRunStyle();
    if (paragraph.getStyleID() != null) { 
     XWPFStyle style = styles.getStyle(paragraph.getStyleID());
     if (style != null) { // paragraph has style
      dFontSize = getFontSizeAsDouble(style); // styled by paragraph style run properties
     }
     if (dFontSize == null) { // paragraph style has no run properties
      if (style.getLinkStyleID() != null) { // maybe linked style has run properties
       style = styles.getStyle(style.getLinkStyleID());
       dFontSize = getFontSizeAsDouble(style); // styled by run properties linked in paragraph style
      }
      if (dFontSize == null) { // linked style also has no run properties
       if (defaultRunStyle != null) {
        dFontSize = defaultRunStyle.getFontSizeAsDouble(); // styled by default run style run properties
       }
      }
     }
    }
    if (dFontSize == null) {
     XWPFStyle style = styles.getStyleWithName("Normal");
     if (style != null) {
      dFontSize = getFontSizeAsDouble(style); // styled by "Normal" paragraph style run properties
     }
    }
    if (dFontSize == null) {
     if (defaultRunStyle != null) {
      dFontSize = defaultRunStyle.getFontSizeAsDouble(); // styled by default run style run properties
     }
    }
   }
   if (dFontSize == null) { // if all fails, then 11 as default
    dFontSize = 11d;
   }
  }
  return dFontSize;
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("./WordDocument.docx"));

  for (IBodyElement bodyElement : document.getBodyElements()) {
   if (bodyElement instanceof XWPFParagraph) {
    XWPFParagraph paragraph = (XWPFParagraph) bodyElement;
    for(IRunElement runElement : paragraph.getIRuns()) {
     if (runElement instanceof XWPFRun) {
      XWPFRun run = (XWPFRun) runElement;
      System.out.println(run);
      Double dFontSize = getFontSizeAsDouble(run);
      System.out.println(dFontSize);      
     }
    }
   }
  }

  document.close();
 }
}

I hope that works and considers all possibilities. But the latter I doubt. Then the code of Double getFontSizeAsDouble(XWPFRun run) will get longer and longer...

Upvotes: 2

Related Questions