Reputation: 1565
I am verifying an Excel spreadsheet. I want to ensure that all of the cells in specific columns, store the same data types.
To do so, I am iterating over rows, and then collecting a set of CellType's returned by getCellType method. Once I have such set, I could ensure that set has only one item and whether it is an expected type.
Problem
Unfortunately, Apache POI CellTypes are quite generic. For example NUMERIC is used for whole numbers as well as date types. They also do not provide me an information about size of the field. Moreover, I noticed that sometimes dates are recognized as String.
My question is:
Upvotes: 0
Views: 1217
Reputation: 61890
What you have observed is not a drawback of apache poi
but is how Excel
stores cell contents. There are only cells of type String
, Numeric
, Boolean
, Error
or Formula
in Excel
. Whether a numeric cell contains a date is determined only by the cell's number format. If the cell is date formatted, then it shows the number as a date, else not. And if a cell is of type String
but the string looks like a date, then even Excel
will not handle that content as a date automatically.
So the only way is determining the different cell contents as shown in Busy Developers' Guide to HSSF and XSSF Features - Getting the cell contents. Mainly differentiate by cell type. For Numeric
cell type additional differentiate by DateUtil.isCellDateFormatted
. And for formula cells maybe you additional needs to get the result type of the formula.
Complete example which reads all cells of first sheet of SAMPLE.xlsx
and determines the cell types as exact as possible.
import java.io.FileInputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.ss.usermodel.CellType;
class ReadExcelExample {
public static void main(String[] args) throws Exception {
Workbook wb = WorkbookFactory.create(new FileInputStream("SAMPLE.xlsx"));
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
for (Cell cell : row) {
CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex());
System.out.print(cellRef.formatAsString() + " ");
//switch (cell.getCellTypeEnum()) { // until apache poi 3.17
switch (cell.getCellType()) { // from apache poi 4.0.0
case STRING:
System.out.println("is a string cell: " + cell.getRichStringCellValue().getString());
break;
case NUMERIC:
if (DateUtil.isCellDateFormatted(cell)) {
System.out.println("is a date cell: " + cell.getDateCellValue());
} else {
System.out.println("is a numeric cell: " + cell.getNumericCellValue());
}
break;
case BOOLEAN:
System.out.println("is a boolean cell: " + cell.getBooleanCellValue());
break;
case FORMULA:
System.out.print("is a formula cell: " + cell.getCellFormula());
//switch (cell.getCachedFormulaResultTypeEnum()) { // until apache poi 3.17
switch (cell.getCachedFormulaResultType()) { // from apache poi 4.0.0
case STRING:
System.out.println(" returning a string: " + cell.getRichStringCellValue().getString());
break;
case NUMERIC:
if (DateUtil.isCellDateFormatted(cell)) {
System.out.println(" returning a date: " + cell.getDateCellValue());
} else {
System.out.println(" returning a number: " + cell.getNumericCellValue());
}
break;
case BOOLEAN:
System.out.println(" returning an boolean: " + cell.getBooleanCellValue());
break;
case ERROR:
System.out.println(" returning an error: " + cell.getErrorCellValue());
break;
default:
System.out.println("default formula cell"); //should never occur
}
break;
case ERROR:
System.out.println("is a error cell: " + cell.getErrorCellValue());
break;
case BLANK:
System.out.println("is a blank cell");
break;
default:
System.out.println("default cell"); //should never occur
}
}
}
wb.close();
}
}
Upvotes: 2