Reputation: 702
I'd like to parse an Excel file with java, so I'm using apache poi libraries, here you are the maven dependencies:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.14</version>
</dependency>
This will include a series of dependencies:
poi-ooxml-3.14.jar
poi-3.14.jar
commons-codec-1.10.jar
poi-ooxml-schemas-3.14.jar
xmlbeans-2.6.0.jar
stax-api-1.0.1.jar
curvesapi-1.03.jar
When I try to read an Office 365 Excel file (.xslx) with this code:
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class ExcelConverter {
public static void main(String[] args) throws Exception{
String excelFilePath = "C:/temp/Book1.xlsx";
File myFile = new File(excelFilePath);
System.out.println("File exists: " + myFile.exists());
FileInputStream inputStream = new FileInputStream(myFile);
Workbook workbook = new XSSFWorkbook(inputStream);
}
}
I got the following console message:
File exists: true
Exception in thread "main" org.apache.poi.POIXMLException: Strict OOXML isn't currently supported, please see bug #57699
at org.apache.poi.POIXMLDocumentPart.getPartFromOPCPackage(POIXMLDocumentPart.java:679)
at org.apache.poi.POIXMLDocumentPart.<init>(POIXMLDocumentPart.java:122)
at org.apache.poi.POIXMLDocumentPart.<init>(POIXMLDocumentPart.java:115)
at org.apache.poi.POIXMLDocument.<init>(POIXMLDocument.java:61)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:273)
at org.myCompany.excel.ExcelConverter.main(ExcelConverter.java:25)
Do you know what can I do to solve the issue? Thanks in advance
Upvotes: 14
Views: 18319
Reputation: 3924
The excel-streaming-reader
library has a beta feature now for converting from Strict OOXML by setting the flag convertFromOoXmlStrict
in the StreamingReader
builder:
Upvotes: -1
Reputation: 138
I use a slightly modified version of @PJFanning's ooxml converter https://github.com/pjfanning/ooxml-strict-converter to check for and convert strict Excel files and then read them with POI. In limited testing it seems to work, although the files I have are pretty straight-forward.
Upvotes: 1
Reputation: 718718
There doesn't currently appear to be any simple solution other than "Don't save your spreadsheet in "strict OOXML" format."
For example, in Excel use
Save As --> "Excel Workbook (.xlsx)"
instead of
Save As --> "Strict Open XML Spreadsheet (.xlsx)"
Do you know why Excel Worksheet and this format have the same file extension?
That would be something that only Microsoft can answer. But I guess that the engineers (or their management) did not anticipate that it would be necessary for application software to make the distinction.
I am accepting Files as input and then processing them based on the extension. How can I know without try-catch?
There is nothing that will let you process the document with current generation POI.
I guess you could code something to read the file and look for the signature for "strict OOXML" format1 before passing the file to POI, but there's not much point. You would be writing a stack of extra code just so that you can replace the try-catch with other logic.
1 - See https://www.loc.gov/preservation/digital/formats/fdd/fdd000395.shtml#sign
Upvotes: 23