antken
antken

Reputation: 989

How to speed up autosizing columns in apache POI?

I use the following code in order to autosize columns in my spreadsheet:

for (int i = 0; i < columns.size(); i++) {
   sheet.autoSizeColumn(i, true);
   sheet.setColumnWidth(i, sheet.getColumnWidth(i) + 600);
}

The problem is it takes more than 10 minutes to autosize each column in case of large spreadsheets with more than 3000 rows. It goes very fast for small documents though. Is there anything which could help autosizing to work faster?

Upvotes: 33

Views: 42027

Answers (5)

Rjelinek
Rjelinek

Reputation: 26

The biggest problem of XSSFSheet I encountered is that it doesn't have a funcion to return all values of specific column, you have to iterate all by row, so when you specify column, it actually iterates over all rows and columns. That probably where the slowness comes from I made my own implementation where it iterates over all columns and rows only onece, and then change the columns specified in listOfColumns, I also added maxWidth for cases where you work with long texts, so you don't end up with crazy widths. It would probably be good idea to add implementation for font width as Zz'Rot have in his implementation

private static void autoSizeColumns(XSSFSheet sheet, Integer maxWidth, List<Integer> listOfColumns) {
    HashMap<Integer,Integer> sizesArray = new HashMap<>();
    for (Row row : sheet) {
        for (Cell cell : row) {
            int column = cell.getColumnIndex();
            int oldMaxSize = Optional.ofNullable(sizesArray.get(column)).orElse(0);
            int newMaxSize = switch (cell.getCellType()) {
                case STRING,FORMULA ->
                        Math.max(oldMaxSize, cell.getStringCellValue().length());
                case NUMERIC ->
                        Math.max(oldMaxSize, Double.toString(cell.getNumericCellValue()).length());
                case BOOLEAN ->
                        Math.max(oldMaxSize, Boolean.toString(cell.getBooleanCellValue()).length());
                case ERROR ->
                        Math.max(oldMaxSize, Byte.toString(cell.getErrorCellValue()).length());
                default ->
                        oldMaxSize;
            };
            if(oldMaxSize != newMaxSize)
                sizesArray.put(column, newMaxSize);
        }
    }
    if(listOfColumns == null){
        for (int i = 0; i < sizesArray.size(); i++) {
            int width = sizesArray.get(i) * 256;
            if (maxWidth != null && width > maxWidth) {
                width = maxWidth;
            }
            sheet.setColumnWidth(i, width);
        }
        return;
    }
    for (Integer column : listOfColumns) {
        Integer size = sizesArray.get(column);
        int width = sizesArray.get(size) * 256;
        if (maxWidth != null && width > maxWidth) {
            width = maxWidth;
        }
        sheet.setColumnWidth(column, width);
    }
}

Upvotes: 0

Erich13
Erich13

Reputation: 69

Unfortunately I don't have enough reputations yet to add comments in answers. So here some annotations:

  • When using Row row = sheet.getRow(sheet.getFirstRowNum()); be shure, this row contains at least a value in the last column. Otherwise the cellIterator will end too early, i.e. if a subsequent row has a value in this column, this column will not be autosized. This problem is bypassed if rowcontains the headers (names of the columns). Or explicit use a known header row, e.g.
    int indexOfHeaderRow = ...;
    ...
    Row row = sheet.getRow(indexOfHeaderRow);
  • Jakub Słowikowski
    sheet.setColumnWidth(i, Math.min(width, MAX_CELL_WIDTH)); // <- set calculated cellwidth
    I'm not shure about this line because there is no information about content of MAX_CELL_WIDTH - perhaps overall maximum? So I used instead:
    sheet.setColumnWidth(i, Math.max(width, 2048));
    2048 seams to be the default width? This value prevents extremely narrow widths for empty columns.

Upvotes: 0

Jakub Słowikowski
Jakub Słowikowski

Reputation: 1553

The autosizeColumn() function very slow and unneficient. Even authors of apache POI mentioned in docs, that:

This process can be relatively slow on large sheets, ...

Calculating and setting the cell's width manually is way faster - in my case I reduced the time from ~25,000ms to ~1-5ms.

This is how to achieve it (I was basing on Vladimir Shcherbukhin's answer:

Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet();
final int[] maxNumCharactersInColumns = new int[headers.length]; // maximum number of characters in columns. Necessary to calculate the cell width in most efficient way. sheet.autoSizeColumn(...) is very slow.

Row headersRow = sheet.createRow(0);
CellStyle headerStyle = createHeadersStyle(workbook); // createHeadersStyle() is my own function. Create headers style if you want

for (int i = 0; i < headers.length; i++) { // create headers
        Cell headerCell = headersRow.createCell(i, CELL_TYPE_STRING);
        headerCell.setCellValue(headers[i]);
        headerCell.setCellStyle(headerStyle);

        int length = headers[i].length();
        if (maxNumCharactersInColumns[i] < length) { // adjust the columns width
            maxNumCharactersInColumns[i] = length + 2; // you can add +2 if you have filtering enabled on your headers
        }
}

int rowIndex = 1;
    for (List<Object> rowValues : rows) {
        Row row = sheet.createRow(rowIndex);

        int columnIndex = 0;
        for (Object value : rowValues) {
            Cell cell = createRowCell(row, value, columnIndex); // createRowCell() is my own function.

            int length;
            if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
                String cellValue = cell.getStringCellValue();

                // this is quite important part. In some excel spreadsheet you can have a values with line-breaks. It'll be cool to handle that scenario :)
                String[] arr = cellValue.split("\n"); // if cell contains complex value with line breaks, calculate only the longest line
                length = Arrays.stream(arr).map(String::length).max(Integer::compareTo).get();
            } else {
                length = value != null ? value.toString().length() : 0;
            }

            if (maxNumCharactersInColumns[columnIndex] < length) { // if the current cell value is the longest one, save it to an array
                maxNumCharactersInColumns[columnIndex] = length;
            }

            columnIndex++;
        }
        rowIndex++;
    }

    for (int i = 0; i < headers.length; i++) {
        int width = (int) (maxNumCharactersInColumns[i] * 1.45f) * 256; // 1.45f <- you can change this value
        sheet.setColumnWidth(i, Math.min(width, MAX_CELL_WIDTH)); // <- set calculated cell width
    }

    sheet.setAutoFilter(new CellRangeAddress(0, 0, 0, headers.length - 1));

    ByteArrayOutputStream output = new ByteArrayOutputStream();
    workbook.write(output);
    workbook.close();

Upvotes: 0

Vladimir Shcherbukhin
Vladimir Shcherbukhin

Reputation: 29

The autoSizeColumn function itself works not perfect and some columns width not exactly fit the data inside. So, I found some solution that works for me.

  1. To avoid crazy calculations let give that to autoSizeColumn() function:
   sheet.autoSizeColumn(<columnIndex>);
  1. Now, our column autosized by library but we wont to add a little bit more to the current column width to make table looks fine:
   // get autosized column width
   int currentColumnWidth = sheet.getColumnWidth(<columnIndex>);

   // add custom value to the current width and apply it to column
   sheet.setColumnWidth(<columnIndex>, (currentColumnWidth + 2500));
  1. The full function could looks like:
   public void autoSizeColumns(Workbook workbook) {
        int numberOfSheets = workbook.getNumberOfSheets();
        for (int i = 0; i < numberOfSheets; i++) {
            Sheet sheet = workbook.getSheetAt(i);
            if (sheet.getPhysicalNumberOfRows() > 0) {
                Row row = sheet.getRow(sheet.getFirstRowNum());
                Iterator<Cell> cellIterator = row.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    int columnIndex = cell.getColumnIndex();
                    sheet.autoSizeColumn(columnIndex);
                    int currentColumnWidth = sheet.getColumnWidth(columnIndex);
                    sheet.setColumnWidth(columnIndex, (currentColumnWidth + 2500));
                }
            }
        }
    }

P.S. Thanks Ondrej Kvasnovsky for the function https://stackoverflow.com/a/35324693/13087091

Upvotes: 2

antken
antken

Reputation: 989

Solution which worked for me:

It was possible to avoid merged regions, so I could iterate through the other cells and finally autosize to the largest cell like this:

int width = ((int)(maxNumCharacters * 1.14388)) * 256;
sheet.setColumnWidth(i, width);

where 1.14388 is a max character width of the "Serif" font and 256 font units.

Performance of autosizing improved from 10 minutes to 6 seconds.

Upvotes: 55

Related Questions