Habib Karbasian
Habib Karbasian

Reputation: 676

Converting a large data into excel in Apachi POI Java

I am working on a large CSV (~200 mb of text file) which I would like to convert into excel sheet but the workbook becomes so memory consuming that in the middle of the process, Java throws "GC Overhead limit exceeded"! I have checked the code if I am generating dummy references but I think none exists.

In my opinion those library calls from Apachi - POI might generate some references that keeps garbage collector so busy.

My question is if I could just write the workbook into a file chunk by chunk like text file something like appending to a text file without bringing it into memory. Is there any solution for that or am I missing something here?

GC throws the exception in the following code:

    private void updateExcelWorkbook(String input, String fileName, Workbook workbook) {
    try {
        Sheet sheet = workbook.createSheet(fileName);

        // Create a new font and alter it.
        Font font = workbook.createFont();
        font.setFontHeightInPoints((short) 11);
        font.setBold(true);


        // Fonts are set into a style so create a new one to use.
        CellStyle style = workbook.createCellStyle();
        style.setFont(font);
        Row row;
        Cell cell;
        String[] columns;
        String[] lines = input.split("\n");
        int colIndex;
        int rowIndex = 1;

        for (String line : lines) {
            row = sheet.createRow(rowIndex++);
            columns = line.split("\t");
            colIndex = 0;

            for (String column: columns) {
                cell = row.createCell(colIndex++);
                if (rowIndex == 1)
                    cell.setCellStyle(style);
                cell.setCellValue(column);
            }
        }
    } catch (Exception ex) {
        System.out.println(ex.getMessage());
    }
}

Upvotes: 2

Views: 2898

Answers (1)

Andreas
Andreas

Reputation: 159175

Seems you are using the POI usermodel, which has a very high memory footprint, because it keeps the entire worksheet in memory, similar to how DOM keeps an entire XML document in memory.

You need to use a streaming API. Using POI, you can create .xlsx files using the SXSSF Buffered Streaming API, as mentioned here: https://poi.apache.org/spreadsheet/index.html#SXSSF+(Since+POI+3.8+beta3)

The page linked above has this image, showing the Spreadsheet API Feature Summary of POI:
Spreadsheet API Feature Summary
(source: apache.org)

Upvotes: 3

Related Questions