Reputation: 21
I'm trying to convert an excel file to XSSFWorkbook , I have around 7000 rows and around 145 columns. It's taking around 15 mins to convert that excel file to XSSFWorkbook at line number 2 , in my code below :-
InputStream fs = new FileInputStream(filename); // (1)
XSSFWorkbook wb = new XSSFWorkbook(fs); // (2)
XSSFSheet sheet = wb.getSheetAt(0);
Instead of adding 7000 rows to XFFSWorkbook , I just want to add 30 rows to XFFSWorkbook while conversion at line number 2 ?
If not, how can I reduce the amount of time taken to convert excel to XSSFWorkbook ?
Upvotes: 2
Views: 2475
Reputation: 11848
To increase performance slightly on very large files, simply pass the file directly in to the workbook instead of a stream.
From this link:
When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file.
XSSFWorkbook wb = new
XSSFWorkbook(new File (filename)); // (2)
XSSFSheet sheet = wb.getSheetAt(0);
Also, it's cleaner to use the WorkBookFactory as it abstracts the type of excel file you are working with :
Workbook workbook = WorkbookFactory.create(new File(filename));
Upvotes: 1
Reputation: 61870
A *.xlsx
file is a ZIP
archive containing the data of the Excel
in a directory structure having different XML
files.
For example there are
/xl/workbook.xml
describing the basic workbook structure,/xl/worksheets/sheet1.xml
, /xl/worksheets/sheet2.xml
, ...
/xl/worksheets/sheetN.xml
having the sheet data - Here are the rows
and the cells but not all data within the cells are directly stored
there. Also the cell styles are not directly stored there. - ,/xl/styles.xml
which contains the cell styles,/xl/sharedStrings.xml
which contains all string content of cells in
all sheets. This is to avoid multiple storing the same string much
times if this string is used multiple times in cells.So if you wants to read the *.xlsx
ZIP
archive, you needs unpacking the ZIP
archive and then parsing at least the four XML
files mentioned above to get the data for the XSSFWorkbook
. This is what apache poi
does while XSSFWorkbook wb = new XSSFWorkbook(fileinputstream);
.
So if you really needs a XSSFWorkbook
as the result, there is no way around this process. And if you not suspects that apache poi
had programmed explicit delay routines, then there will not be a possibility to reduce the amount of time for this process.
Your approach only to read less rows than are stored into the sheet, could possibly be time saving. But then your result would be a XSSFWorkbook
containing all the styles and all the string contents but only some sheet data related to those styles and string data. So it will lead to a partially broken XSSFWorkbook
. Thats why nobody has really thought about this approach.
Only if the requirement is only to read the plain unformatted data from one of the /xl/worksheets/sheetN.xml
without creating a XSSFWorkbook
, then you only needs unpacking the ZIP
archive and then parsing only the needed /xl/worksheets/sheetN.xml
and the /xl/sharedStrings.xml
to get the string content of the cells from. This would be possible in less time than the whole process described above.
Upvotes: 1