Reputation: 181
I am using Java and the Apache POI library to parse a powerpoint slide. I can extract the shapes and connectors, but I am having difficulty extracting the "text" that is sitting within each shape. Here is sample code that gets the shape and this is working fine.
XMLSlideShow ppt = new XMLSlideShow(new FileInputStream(file));
List<XSLFSlide> slide = ppt.getSlides();
System.out.println("These are the shapes in the presentation: ");
for (int i = 0; i < slide.size(); i++) {
List<XSLFShape> listOfShapes = slide.get(i).getShapes();
for (int j = 0; j < listOfShapes.size(); j++) {
XSLFShape thisShape = listOfShapes.get(j);
String thisShapeName = thisShape.getShapeName();
int thisShapeID = thisShape.getShapeId();
XSLFShapeContainer thisShapeParent = thisShape.getParent();
Rectangle2D thisAnchor = thisShape.getAnchor();
String textBody = thisShape.;
System.out.println("Name: " + thisShapeName + " ID: " + thisShapeID + " Anchor: " + thisAnchor.toString());
}
}
I thought, based on what I read about the XSLFTextShape class and elsewhere that I could get the text on each shape by simply saying:
String textOnShape = thisShape.getTextBody();
But getTextBody does not appear to be an acceptable method. I have read the question and answer to this same question using Apache POI HSLF, but I am using XSLF (the newer version). I am missing something obvious with the syntax, but if anyone has done this before and has a thought it would be appreciated.
Upvotes: 0
Views: 366
Reputation: 181
I eventually figured this out. When you are iterating through you need to recast the shape object a couple of times as follows:
XSLFShape thisShape = listOfShapes.get(j);
XSLFSimpleShape thisSimpleShape = (XSLFSimpleShape) thisShape;
XSLFTextShape thisTextShape = (XSLFTextShape) thisSimpleShape;
System.out.println(thisTextShape.getText());
That will get you the text sitting on the shape itself.
Upvotes: 1