Peter Osta
Peter Osta

Reputation: 21

converting excel to XSSFWorkbook using Apache poi taking long time

I'm trying to convert an excel file to XSSFWorkbook , I have around 7000 rows and around 145 columns. It's taking around 15 mins to convert that excel file to XSSFWorkbook at line number 2 , in my code below :-

InputStream fs = new FileInputStream(filename);   // (1)
XSSFWorkbook wb = new XSSFWorkbook(fs);           // (2)
XSSFSheet sheet = wb.getSheetAt(0); 

Instead of adding 7000 rows to XFFSWorkbook , I just want to add 30 rows to XFFSWorkbook while conversion at line number 2 ?

If not, how can I reduce the amount of time taken to convert excel to XSSFWorkbook ?

Upvotes: 2

Views: 2475

Answers (2)

rjdkolb
rjdkolb

Reputation: 11848

To increase performance slightly on very large files, simply pass the file directly in to the workbook instead of a stream.

From this link:

When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file.

XSSFWorkbook wb = new 
XSSFWorkbook(new File (filename));           // (2)
XSSFSheet sheet = wb.getSheetAt(0);

Also, it's cleaner to use the WorkBookFactory as it abstracts the type of excel file you are working with :

Workbook workbook = WorkbookFactory.create(new File(filename));

Upvotes: 1

Axel Richter
Axel Richter

Reputation: 61870

A *.xlsx file is a ZIP archive containing the data of the Excel in a directory structure having different XML files.

For example there are

  • /xl/workbook.xml describing the basic workbook structure,
  • /xl/worksheets/sheet1.xml, /xl/worksheets/sheet2.xml, ... /xl/worksheets/sheetN.xml having the sheet data - Here are the rows and the cells but not all data within the cells are directly stored there. Also the cell styles are not directly stored there. - ,
  • /xl/styles.xml which contains the cell styles,
  • /xl/sharedStrings.xml which contains all string content of cells in all sheets. This is to avoid multiple storing the same string much times if this string is used multiple times in cells.

So if you wants to read the *.xlsx ZIP archive, you needs unpacking the ZIP archive and then parsing at least the four XML files mentioned above to get the data for the XSSFWorkbook. This is what apache poi does while XSSFWorkbook wb = new XSSFWorkbook(fileinputstream);.

So if you really needs a XSSFWorkbook as the result, there is no way around this process. And if you not suspects that apache poi had programmed explicit delay routines, then there will not be a possibility to reduce the amount of time for this process.

Your approach only to read less rows than are stored into the sheet, could possibly be time saving. But then your result would be a XSSFWorkbook containing all the styles and all the string contents but only some sheet data related to those styles and string data. So it will lead to a partially broken XSSFWorkbook. Thats why nobody has really thought about this approach.

Only if the requirement is only to read the plain unformatted data from one of the /xl/worksheets/sheetN.xml without creating a XSSFWorkbook, then you only needs unpacking the ZIP archive and then parsing only the needed /xl/worksheets/sheetN.xml and the /xl/sharedStrings.xml to get the string content of the cells from. This would be possible in less time than the whole process described above.

Upvotes: 1

Related Questions