Reputation: 181
I have a web application (tomcat 8.5.4, java 1.7.0_72) that was previously generating valid xlsx excel files using Apache POI. I'm working on converting the application to manage the jar dependencies with maven and now the file being generated is considered corrupted (or otherwise invalid) by excel. I haven't changed the code that generates the file at all and the jars included are mostly the same other than some version changes and removing some jars that were unused and not in the maven dependency tree (removed jars listed below).
Does anyone know what I could be doing to cause POI to generate the files differently or why excel thinks that these changes make it invalid? I've searched a lot for errors with POI and corrupted excel files and it looked like there were several bugs in POI were it could corrupt existing files or break creating large files, but nothing that looked like it applied in this case. I saw several questions here that looked like they could be similar but didn't end up applying.
When I change the xlsx files created before and after to zip files and extract them then compare the directories with windiff the differences are (working -> corrupted).
[content_types].xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> -> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> -> <Types>
_rels.rels, _rels\workbook.xml.rels
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
-> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> -> <Relationships>
docprops\core.xml
The time created is different
<?xml version="1.0" encoding="UTF-8" standalone="no"?> -> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
xl\styles.xml
the numFmtId is 1 lower
xl\worksheets\sheet1.xml, docprops\app.xml, xl\sharedstrings.xml, xl\workbook.xml
Identical
My company is using a local artifactory repository instead of pointing at the official maven repository, so it's been lots of fun loading all the dependencies in. My assumption is that this is caused by a jar I'm missing or using the wrong version, but I'm not actually generating any errors, just getting a bad file.
I'm including poi-3.1.11.jar, poi-ooxml-3.11.jar, and poi-ooxml-schemas-3.11.jar. Previously we had commons-codec 1.9 in tomcat/lib and commons-codec 1.3 in WEB-INF/lib. Under maven I've included 1.9, although I also tried going back to 1.3 and the file was still corrupted.
I tried upgrading to all of the poi versions to 3.1.14, but that didn't solve it. I tried going back to the exact poi[-ooxml-schemas]-3.11-20141221 jars that were working before but that didn't solve it. I tried switching the SXSSFWorkbook to a normal XSSFWorkbook but that didn't solve it.
Here is a list of jars that I removed when converting to maven, would any of these have any affect on apache poi?
ecj-4.5.1
el-impl-2.2
itext-2.0.8
jimi-1.0
js
opencsv-1.8
standard (1.1.2)
Upvotes: 2
Views: 3716
Reputation: 11
Thanks for describing this issue, Tim.
In my case, the root cause of this problem was different. There are no explicit or transitive dependencies on Xalan or Xerces in my project.
I use Joost and I decided to set the system property instead of using their class in the code:
System.setProperty(TransformerFactory::class.java.name, "net.sf.joost.trax.TransformerFactoryImpl")
After this line is called POI generates broken Excel files because it starts using net.sf.joost.trax.TransformerFactoryImpl internally.
Upvotes: 0
Reputation: 1514
I had exactly the same issue went through the same steps. For me removing xalan
dependency which was inherited into my project by another dependency by excluding it, didn't help. I added the latest version of xalan (xalan 2.7.1
) and it saved my day.
Upvotes: 1
Reputation: 181
It turned out to be caused by a jar (xalan-2.4.1) that was pulled in as a dependency of fop-0.20.05 that hadn't been included before converting to maven. Once I excluded that dependency it creates valid files again. I should have been suspicious of all of those really old jars from 2002 & 2003.
In case anyone has a similar problem in the future but not caused by the same jars here was my troubleshooting methods:
I turned on POI logging with
-Dorg.apache.poi.util.POILogger=org.apache.poi.util.SystemOutLogger
-Dpoi.log.level=1
I found a few errors complaining about an outdated XML parser and being unable to setup the SAX Security Manager. Some internet searches led me to exclude xercesImpl (which our app had not previously included) from fop-0.20.5. Excluding it fixed the errors in the logger but the file being generated was still considered corrupt by excel and still had the same differences.
Finally I broke down and just made a new simple maven java application that just created a very simple excel file with POI. I initially tried it with just poi and poi-ooxml as the dependencies and it generated a valid file. When I added all the dependencies from my full application it generated an invalid file. Then I removed one dependency at a time until it worked. The problem dependency was fop which I still needed so then I tried excluding each of it's dependencies that our app had not previously included until the file worked and identified xalan as the problem.
Upvotes: 8