Nicola
Nicola

Reputation: 385

Java InputStreamReader Error (org.apache.poi.openxml4j.exceptions.InvalidOperationException)

I am trying to convert pptx files to txt (Text Extraction) using Apache POI Framework (Java). I'm new in coding Java, so I don't know a lot about Buffered Readers/InputStream, etc.

What I tried is:

import org.apache.poi.xslf.XSLFSlideShow;
import org.apache.poi.xslf.extractor.XSLFPowerPointExtractor;
import org.apache.poi.xslf.usermodel.XMLSlideShow;    

... Classes and Stuff ....

String inputfile = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
InputStream fis = new FileInputStream(inputfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fis));
String fileName = br1.readLine();

System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(fileName))).getText());
br1.close();

My goal is, to write the extracted text into a variable, but It doesn't even work to print it on console... What I get is:

org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: 'PK
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:102)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199) 
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178) 
org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:69) 
org.apache.poi.xslf.XSLFSlideShow.<init>(XSLFSlideShow.java:90) 

Any help would be greatly appreciated!

Upvotes: 1

Views: 1300

Answers (2)

ujulu
ujulu

Reputation: 3309

I cannot give you the correct answer (because I myself don't use POI), but I can tell you where your mistake might lie. The constructor of the class XSLFSlideShow is expecting file path as its argument. But you are passing an InputStream. Try it as follows:

String filePath = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(filePath))).getText());

Upvotes: 0

centic
centic

Reputation: 15872

You are doing much to much, in fact you are trying to read the data of the PPTX itself as filename, better simply use

System.out.println(new XSLFPowerPointExtractor(
    new XMLSlideShow(new XSLFSlideShow(
    "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"))).getText());

or more generic

POITextExtractor extractor = ExtractorFactory.createExtractor(
    new java.io.File("X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"");
System.out.println(extractor.getText());
extractor.close();

Upvotes: 1

Related Questions