Fitz
Fitz

Reputation: 327

Apache POI : Check if a column is empty

I need to check if a column is empty or not in .xlsx files but couldn't find anything smoother than this :

    public static boolean isColumnEmpty(int column, int firstRow, XSSFSheet sheet) {


    XSSFRow row = sheet.getRow(firstRow);

    while (row != null) {
        Cell c = row.getCell(column, Row.RETURN_BLANK_AS_NULL);
        if (c != null) {
            return false;
        }
        row = sheet.getRow(firstRow++);
    }
    return true;

}

firstRow is just the row you want to start with (actually my column isn't completely empty, there still is a header).

I wonder if some of you had better ideas!

Upvotes: 2

Views: 6056

Answers (1)

IceArdor
IceArdor

Reputation: 2041

The answer varies on how sparse the physical rows are in your sheet, your desire to have simple code, and how much you care about execution speed.

A good compromise of the three would only loop through physical rows, and perform decently well as long as startRow is much closer to getFirstRowNum() than getLastRowNum().

public static boolean isColumnEmpty(Sheet sheet, int columnIndex, int startRow) {
    for (Row row : sheet) {
        if (row.getRowNum() < startRow) continue;
        Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
        if (cell != null) {
            return false;
        }
    }
    return true;
}

For a workbook with dense rows, your code is better.

For the code that does the least work, you could combine these two approaches (I prefer for-loops over while-loops as it's quicker to verify that your code won't get stuck in an infinite loop)

public static boolean isColumnEmpty(Sheet sheet, int columnIndex, int startRow) {
    int firstRow = sheet.getFirstRowNum();
    int lastRow = sheet.getLastRowNum();
    // No need to check rows above the first row
    startRow = Math.max(startRow, firstRow);
    int numRows = sheet.getPhysicalNumberOfRows();

    // Compute an estimate of the number of rows that each method
    // will access.
    // Assume the cost to access one row is the same
    // between an explicit getRow() or through the rowIterator.
    // Assume physical rows are uniformly spaced, which is unlikely true
    // but the best estimate possible without iterating over the rows.
    double rowDensity = (lastRow - firstRow + 1) / numRows;
    double estimatedForEachLoopCost = numRows;
    double estimatedForLoopCost = (lastRow - startRow) + 1) * rowDensity;
    if (estimatedForEachLoopCost < estimatedForLoopCost) {
        // for-each iteration
        for (Row row : sheet) {
            if (row.getRowNum() < startRow) continue;
            Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
            if (cell != null) {
                return false;
            }
        }
        return true;
    } else {
        for (int r=startRow; r<=lastRow; r++) {
            Row row = sheet.getRow(r);
            if (row == null) continue;
            Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
            if (cell != null) {
                return false;
            }
        }
        return true;
    }
}

If you really care about performance, you can fork POI and write a method to expose the TreeMap<Integer, XSSFRow> that XSSFSheet uses to access the rows. Then you could access the minimal number of rows with _rows.tailMap(startRow, inclusive=true).

If you put in a patch and test case on the POI bugzilla for a method that returns java.util.Collections.unmodifiableSortedMap(_rows.subMap(startRow, true, endRow, true)) from HSSF, XSSF, and SXSSF (fails if start row or end row are outside access window, or uses a column tracker similar to autosize column tracker), then add an isColumnEmpty function to the appropriate class, then you could avoid maintaining a fork if your patch gets accepted.

Upvotes: 2

Related Questions