Reputation: 12207
I want to extract all bold text from a DOCX file using docx4j but I get a class cast exception with this code:
import java.util.List;
import javax.xml.bind.JAXBException;
import org.docx4j.Docx4J;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.wml.Text;
public class Main
{
public static void main(String[] args) throws Docx4JException, JAXBException
{
var wordMLPackage = Docx4J.load(new java.io.File("input.docx"));
var doc = wordMLPackage.getMainDocumentPart();
System.out.println((Text)doc.getJAXBNodesViaXPath("//w:r[w:rPr/w:b]/w:t", false).get(0));
}
}
The error is:
Exception in thread "main" java.lang.ClassCastException: class javax.xml.bind.JAXBElement cannot be cast to class org.docx4j.wml.Text (javax.xml.bind.JAXBElement and org.docx4j.wml.Text are in unnamed module of loader 'app') at Main.main(Main.java:37)
Why is an occurence "w:t" not an instance of org.docx4j.wml.Text
and how I get the text instead?
Upvotes: 0
Views: 875
Reputation: 12207
Apparently the way I tried it only works with the R (run) element, text nodes seem to be nested further. I could extract the text as follows:
System.out.println(
((JAXBElement<Text>)doc.getJAXBNodesViaXPath("//w:r[w:rPr/w:b]/w:t", false)
.get(0)).getValue().getValue());
Upvotes: 0