Alexey Chibisov
Alexey Chibisov

Reputation: 198

Apache NiFi ConvertExcelToCSV processor "failed to read zip entry source" error is generated on a proper .xlsx file

I'm trying to process a large (1.3 GB) .xlsx file into .csv with ConvertExcelToCSV processor. The file is a proper .xlsx format, it has no groupings within the file. Error "failed to read zip entry source" is generated. I tried to significantly decrease the number of rows in the file and save as a copy and it worked fine, it was processed by ConvertExcelToCSV, so it seems to me that the error is somehow related to the file size. What can be a possible reason for that error and how can it be avoided? Thanks in advance. NiFi version is 1.6.0

Upvotes: 1

Views: 478

Answers (1)

Alexey Chibisov
Alexey Chibisov

Reputation: 198

Finally with the community help the error reason was found. As Excel is actually a "zip package" of xml files (one per each sheet), those xml are much larger than xlsx. The xml in the given 1.3 GB is around 10 GB (87% compression ratio). NiFi POI engine can't handle files larger than 2 GB. So the recommendation is to check your Excel's contents (can be done by renaming .xlsx to .zip and opening with an archive tool), if it's more than 2 GB - you can't proces it with NiFi.

Upvotes: 1

Related Questions