Reputation: 14022
I'm trying to convert an excel file into a tab separated text file via Apache POI. The excel has some cells formatted with formulas and some empty cells.
Here's a sample of original excel file:
Here's an extract of the final output:
'US' 'USORACLEAP' SYSTEMREFERENCE SUPPLIERID SUPPLIERNAME CLASSIFICATION VENDOR_SITE_CODE SUPPLIERADDRESS1 SUPPLIERADDRESS2 STATE ZIPCODE COUNTRY SOURCE INVOICENUM INVOICEDATE PAYMENTDATE LINE_DESC GL_COMPANY GL_CODE GL_DESCR COSTCENTER CC_DESCR CURRENCY_CODE CHECK_NUMBER NUM_DOCS SPEND TERM PAYMENT_METHOD SYSTEM_APPROVED PO_DISTRIBUTION_ID WALKER_COST_CENTER RGL_LEDGER_ENTITY
US US Oracle AP RANDBETWEEN(3000,100000) "TEXT "&D2 VENDOR "TEXT "&D3 "TEXT "&D3 "TEXT "&D3 ONTARIO RIGHT(D2,5) US "TEXT "&D3 "TEXT "&D3 RANDBETWEEN(43831, 44150) RANDBETWEEN(44105,44135) "TEXT "&D3 RIGHT("000"&RANDBETWEEN(1,999),3) RANDBETWEEN(55000, 60000) "TEXT "&D3 "TEXT "&D3 "TEXT "&D3 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,100000)/100 IMMEDIATE Check "TEXT"&D2 X2
US US Oracle AP 31836 "TEXT "&D3 1099 "TEXT "&D4 "TEXT "&D4 "TEXT "&D4 NY RIGHT(D3,5) US "TEXT "&D4 "TEXT "&D4 RANDBETWEEN(43831,44150) RANDBETWEEN(44105,44135) "TEXT "&D4 RIGHT("000"&RANDBETWEEN(1,999),3) RANDBETWEEN(55000,60000) "TEXT "&D4 "TEXT "&D4 "TEXT "&D4 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,100000)/100 IMMEDIATE Check GSUEDCM03 AF2
US US Oracle AP 3504 "TEXT "&D4 VENDOR "TEXT "&D5 "TEXT "&D5 "TEXT "&D5 NY RIGHT(D4,5) US "TEXT "&D5 "TEXT "&D5 RANDBETWEEN(43831,44150) RANDBETWEEN(44105,44135) "TEXT "&D5 RIGHT("000"&RANDBETWEEN(1,999),3) RANDBETWEEN(55000,60000) "TEXT "&D5 "TEXT "&D5 "TEXT "&D5 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,100000)/100 IMMEDIATE ACH GSUEIT001 AF3
US US Oracle AP 3504 "TEXT "&D5 VENDOR "TEXT "&D6 "TEXT "&D6 "TEXT "&D6 NY RIGHT(D5,5) US "TEXT "&D6 "TEXT "&D6 RANDBETWEEN(43831,44150) RANDBETWEEN(44105,44135) "TEXT "&D6 RIGHT("000"&RANDBETWEEN(1,999),3) RANDBETWEEN(55000,60000) "TEXT "&D6 "TEXT "&D6 "TEXT "&D6 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,100000)/100 IMMEDIATE ACH GSUEIT001 AF4
US US Oracle AP 3504 "TEXT "&D6 VENDOR "TEXT "&D7 "TEXT "&D7 "TEXT "&D7 NY RIGHT(D6,5) US "TEXT "&D7 "TEXT "&D7 RANDBETWEEN(43831,44150) RANDBETWEEN(44105,44135) "TEXT "&D7 RIGHT("000"&RANDBETWEEN(1,999),3) RANDBETWEEN(55000,60000) "TEXT "&D7 "TEXT "&D7 "TEXT "&D7 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,100000)/100 IMMEDIATE ACH GSUEIT001 AF5
As you can see, the 1st row represents column headers. Some of the cells (D1
) have been converted to the actual formula. The 3rd column doesn't have any values so the whole content shifted towards left in the text file.
Here's the code:
private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
StringBuilder sb = new StringBuilder();
setMinInflateRatio(0);
try (Workbook wb = create(inputStream)) {
Sheet firstSheet = wb.getSheetAt(0);
for (Row nextRow : firstSheet) {
Iterator<Cell> cellIterator = nextRow.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch (cell.getCellType()) {
case STRING:
sb.append(cell.getStringCellValue()).append(delimiter);
break;
case BOOLEAN:
sb.append(cell.getBooleanCellValue()).append(delimiter);
break;
case NUMERIC:
sb.append(cell.getNumericCellValue()).append(delimiter);
break;
case FORMULA:
sb.append(cell.getCellFormula()).append(delimiter);
break;
default:
sb.append(EMPTY).append(delimiter);
}
}
sb.append(DEFAULT_LINE_END);
}
}
dumpStringBuilderToFile(sb, targetFile);
}
Can someone please point out what changes should i be making in my code to fix the alignments and the formula issue?
PS: I'm using TAB (\t)
as my delimiter.
UPDATE: Here's the updated code after suggestions.
private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
StringBuilder sb = new StringBuilder();
setMinInflateRatio(0);
try (Workbook wb = create(inputStream)) {
Sheet firstSheet = wb.getSheetAt(0);
FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
DataFormatter formatter = new DataFormatter();
for (Row nextRow : firstSheet) {
Iterator<Cell> cellIterator = nextRow.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
if (cell != null) {
sb.append(format("%-20s", formatter.formatCellValue(cell, evaluator))).append(delimiter);
} else {
sb.append(format("%-20s", EMPTY)).append(delimiter);
}
}
sb.append(DEFAULT_LINE_END);
}
}
dumpStringBuilderToFile(sb, targetFile);
}
Upvotes: 1
Views: 451
Reputation: 61860
If the requirement is writing Excel
data into a text file, then all cell values needs to be get as String
. A convenient way to do so is using DataFormatter of apache poi
. Using DataFormatter
you will get cell values as they are shown in Excel
sheets. E.g. having number formats and date formats. And if you are using DataFormatter
together with a FormulaEvaluator
then formulas get evaluated and evaluated values are converted to String
.
To avoid missing empty cells one needs to get the cells count first, because the cell iterator will skip empty cells. For example the cells count from the header row will be the cells cont for each further row also.
So the whole code would be as simple as this:
import org.apache.poi.ss.usermodel.*;
import java.io.*;
class ExcelToText {
static final String DEFAULT_LINE_END = System.getProperty("line.separator");
static void convertXlsToText(InputStream inputStream, String delimiter, OutputStream outputStream) throws Exception {
StringBuilder sb = new StringBuilder();
Workbook workbook = WorkbookFactory.create(inputStream);
DataFormatter dataFormatter = new DataFormatter(java.util.Locale.US);
FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator();
String cellValue = "";
Sheet sheet = workbook.getSheetAt(0);
Row headerRow = sheet.getRow(0);
int cellCount = 0;
if (headerRow != null) {
cellCount = headerRow.getLastCellNum();
}
if (cellCount > 0) {
for (Row row : sheet) {
for (int c = 0; c < cellCount; c++) {
Cell cell = row.getCell(c, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);
sb.append(cellValue);
if (c < cellCount-1) sb.append(delimiter);
}
sb.append(DEFAULT_LINE_END);
}
}
workbook.close();
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(outputStream, java.nio.charset.StandardCharsets.UTF_8));
bw.append(sb);
bw.flush();
bw.close();
}
public static void main(String[] args) throws Exception {
convertXlsToText(new FileInputStream("./Excel.xlsx"), "\t", new FileOutputStream("./Data.txt"));
}
}
No CellType
checking and extra formula evaluating needed.
To your other requrement: A delimeted text file should only contain real content delimited with the delimiter. There should not be content manipulation. So prepending spaces to the content or filling up with spaces up to a special width is not a good idea in my opinion. If you have tabulator as the delimiter for example, then only tabulator positions set in the text viewer should affect the view. Supplementary added spaces will only disturb.
Upvotes: 2
Reputation: 4700
To get the value from the formula field and not the formula itself check the below implementation:
FormulaEvaluator evaluator = myWorkbook.getCreationHelper().createFormulaEvaluator();
CellValue cellValue = evaluator.evaluate(cell); // where **cell** is your formula cell
switch (cellValue.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
System.out.println(cellValue.getBooleanValue());
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.println(cellValue.getNumberValue());
break;
case Cell.CELL_TYPE_STRING:
System.out.println(cellValue.getStringValue());
break;
case Cell.CELL_TYPE_BLANK:
break;
case Cell.CELL_TYPE_ERROR:
break;
}
}
EDIT:
Regarding the alignment issue, check this: How can I pad a String in Java?
Upvotes: 2