Reputation: 327
I need to check if a column is empty or not in .xlsx files but couldn't find anything smoother than this :
public static boolean isColumnEmpty(int column, int firstRow, XSSFSheet sheet) {
XSSFRow row = sheet.getRow(firstRow);
while (row != null) {
Cell c = row.getCell(column, Row.RETURN_BLANK_AS_NULL);
if (c != null) {
return false;
}
row = sheet.getRow(firstRow++);
}
return true;
}
firstRow
is just the row you want to start with (actually my column isn't completely empty, there still is a header).
I wonder if some of you had better ideas!
Upvotes: 2
Views: 6056
Reputation: 2041
The answer varies on how sparse the physical rows are in your sheet, your desire to have simple code, and how much you care about execution speed.
A good compromise of the three would only loop through physical rows, and perform decently well as long as startRow
is much closer to getFirstRowNum()
than getLastRowNum()
.
public static boolean isColumnEmpty(Sheet sheet, int columnIndex, int startRow) {
for (Row row : sheet) {
if (row.getRowNum() < startRow) continue;
Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
if (cell != null) {
return false;
}
}
return true;
}
For a workbook with dense rows, your code is better.
For the code that does the least work, you could combine these two approaches (I prefer for
-loops over while
-loops as it's quicker to verify that your code won't get stuck in an infinite loop)
public static boolean isColumnEmpty(Sheet sheet, int columnIndex, int startRow) {
int firstRow = sheet.getFirstRowNum();
int lastRow = sheet.getLastRowNum();
// No need to check rows above the first row
startRow = Math.max(startRow, firstRow);
int numRows = sheet.getPhysicalNumberOfRows();
// Compute an estimate of the number of rows that each method
// will access.
// Assume the cost to access one row is the same
// between an explicit getRow() or through the rowIterator.
// Assume physical rows are uniformly spaced, which is unlikely true
// but the best estimate possible without iterating over the rows.
double rowDensity = (lastRow - firstRow + 1) / numRows;
double estimatedForEachLoopCost = numRows;
double estimatedForLoopCost = (lastRow - startRow) + 1) * rowDensity;
if (estimatedForEachLoopCost < estimatedForLoopCost) {
// for-each iteration
for (Row row : sheet) {
if (row.getRowNum() < startRow) continue;
Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
if (cell != null) {
return false;
}
}
return true;
} else {
for (int r=startRow; r<=lastRow; r++) {
Row row = sheet.getRow(r);
if (row == null) continue;
Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
if (cell != null) {
return false;
}
}
return true;
}
}
If you really care about performance, you can fork POI and write a method to expose the TreeMap<Integer, XSSFRow>
that XSSFSheet
uses to access the rows.
Then you could access the minimal number of rows with _rows.tailMap(startRow, inclusive=true)
.
If you put in a patch and test case on the POI bugzilla for a method that returns java.util.Collections.unmodifiableSortedMap(_rows.subMap(startRow, true, endRow, true))
from HSSF, XSSF, and SXSSF (fails if start row or end row are outside access window, or uses a column tracker similar to autosize column tracker), then add an isColumnEmpty function to the appropriate class, then you could avoid maintaining a fork if your patch gets accepted.
Upvotes: 2