saran3h
saran3h

Reputation: 14022

Excel to text conversion properly handle formula and empty cells

I'm trying to convert an excel file into a tab separated text file via Apache POI. The excel has some cells formatted with formulas and some empty cells.

Here's a sample of original excel file: enter image description here

Here's an extract of the final output:

'US'    'USORACLEAP'    SYSTEMREFERENCE SUPPLIERID  SUPPLIERNAME    CLASSIFICATION  VENDOR_SITE_CODE    SUPPLIERADDRESS1    SUPPLIERADDRESS2    STATE   ZIPCODE COUNTRY SOURCE  INVOICENUM  INVOICEDATE PAYMENTDATE LINE_DESC   GL_COMPANY  GL_CODE GL_DESCR    COSTCENTER  CC_DESCR    CURRENCY_CODE   CHECK_NUMBER    NUM_DOCS    SPEND   TERM    PAYMENT_METHOD  SYSTEM_APPROVED PO_DISTRIBUTION_ID  WALKER_COST_CENTER  RGL_LEDGER_ENTITY   
US  US Oracle AP        RANDBETWEEN(3000,100000)    "TEXT "&D2  VENDOR  "TEXT "&D3  "TEXT "&D3  "TEXT "&D3  ONTARIO RIGHT(D2,5) US  "TEXT "&D3  "TEXT "&D3  RANDBETWEEN(43831, 44150)   RANDBETWEEN(44105,44135)    "TEXT "&D3  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000, 60000)   "TEXT "&D3  "TEXT "&D3  "TEXT "&D3  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   Check           "TEXT"&D2   X2  
US  US Oracle AP        31836   "TEXT "&D3  1099    "TEXT "&D4  "TEXT "&D4  "TEXT "&D4  NY  RIGHT(D3,5) US  "TEXT "&D4  "TEXT "&D4  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D4  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D4  "TEXT "&D4  "TEXT "&D4  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   Check           GSUEDCM03   AF2 
US  US Oracle AP        3504    "TEXT "&D4  VENDOR  "TEXT "&D5  "TEXT "&D5  "TEXT "&D5  NY  RIGHT(D4,5) US  "TEXT "&D5  "TEXT "&D5  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D5  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D5  "TEXT "&D5  "TEXT "&D5  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF3 
US  US Oracle AP        3504    "TEXT "&D5  VENDOR  "TEXT "&D6  "TEXT "&D6  "TEXT "&D6  NY  RIGHT(D5,5) US  "TEXT "&D6  "TEXT "&D6  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D6  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D6  "TEXT "&D6  "TEXT "&D6  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF4 
US  US Oracle AP        3504    "TEXT "&D6  VENDOR  "TEXT "&D7  "TEXT "&D7  "TEXT "&D7  NY  RIGHT(D6,5) US  "TEXT "&D7  "TEXT "&D7  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D7  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D7  "TEXT "&D7  "TEXT "&D7  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF5 

As you can see, the 1st row represents column headers. Some of the cells (D1) have been converted to the actual formula. The 3rd column doesn't have any values so the whole content shifted towards left in the text file.

Here's the code:

private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
        StringBuilder sb = new StringBuilder();
        setMinInflateRatio(0);
        try (Workbook wb = create(inputStream)) {
            Sheet firstSheet = wb.getSheetAt(0);

            for (Row nextRow : firstSheet) {
                Iterator<Cell> cellIterator = nextRow.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    switch (cell.getCellType()) {
                        case STRING:
                            sb.append(cell.getStringCellValue()).append(delimiter);
                            break;
                        case BOOLEAN:
                            sb.append(cell.getBooleanCellValue()).append(delimiter);
                            break;
                        case NUMERIC:
                            sb.append(cell.getNumericCellValue()).append(delimiter);
                            break;
                        case FORMULA:
                            sb.append(cell.getCellFormula()).append(delimiter);
                            break;
                        default:
                            sb.append(EMPTY).append(delimiter);
                    }
                }
                sb.append(DEFAULT_LINE_END);
            }
        }

        dumpStringBuilderToFile(sb, targetFile);
    }

Can someone please point out what changes should i be making in my code to fix the alignments and the formula issue? PS: I'm using TAB (\t) as my delimiter.

UPDATE: Here's the updated code after suggestions.

    private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
        StringBuilder sb = new StringBuilder();
        setMinInflateRatio(0);
        try (Workbook wb = create(inputStream)) {
            Sheet firstSheet = wb.getSheetAt(0);
            FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
            DataFormatter formatter = new DataFormatter();
            for (Row nextRow : firstSheet) {
                Iterator<Cell> cellIterator = nextRow.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    if (cell != null) {
                        sb.append(format("%-20s", formatter.formatCellValue(cell, evaluator))).append(delimiter);
                    } else {
                        sb.append(format("%-20s", EMPTY)).append(delimiter);
                    }
                }
                sb.append(DEFAULT_LINE_END);
            }
        }

        dumpStringBuilderToFile(sb, targetFile);
    }

Upvotes: 1

Views: 451

Answers (2)

Axel Richter
Axel Richter

Reputation: 61860

If the requirement is writing Excel data into a text file, then all cell values needs to be get as String. A convenient way to do so is using DataFormatter of apache poi. Using DataFormatter you will get cell values as they are shown in Excel sheets. E.g. having number formats and date formats. And if you are using DataFormatter together with a FormulaEvaluator then formulas get evaluated and evaluated values are converted to String.

To avoid missing empty cells one needs to get the cells count first, because the cell iterator will skip empty cells. For example the cells count from the header row will be the cells cont for each further row also.

So the whole code would be as simple as this:

import org.apache.poi.ss.usermodel.*;
import java.io.*;

class ExcelToText {
 static final String DEFAULT_LINE_END = System.getProperty("line.separator");

 static void convertXlsToText(InputStream inputStream, String delimiter, OutputStream outputStream) throws Exception {
  StringBuilder sb = new StringBuilder();
  Workbook workbook = WorkbookFactory.create(inputStream);
  DataFormatter dataFormatter = new DataFormatter(java.util.Locale.US);
  FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator();
  String cellValue = "";
  Sheet sheet = workbook.getSheetAt(0);
  Row headerRow = sheet.getRow(0);
  int cellCount = 0;
  if (headerRow != null) {
   cellCount = headerRow.getLastCellNum();
  }
  if (cellCount > 0) {
   for (Row row : sheet) {
    for (int c = 0; c < cellCount; c++) {
     Cell cell = row.getCell(c, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
     cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);
     sb.append(cellValue);
     if (c < cellCount-1) sb.append(delimiter);
    }
    sb.append(DEFAULT_LINE_END);
   }
  }
  workbook.close();
  BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(outputStream, java.nio.charset.StandardCharsets.UTF_8));
  bw.append(sb);
  bw.flush();
  bw.close();
 }

 public static void main(String[] args) throws Exception {
  convertXlsToText(new FileInputStream("./Excel.xlsx"), "\t", new FileOutputStream("./Data.txt"));
 }
}

No CellType checking and extra formula evaluating needed.

To your other requrement: A delimeted text file should only contain real content delimited with the delimiter. There should not be content manipulation. So prepending spaces to the content or filling up with spaces up to a special width is not a good idea in my opinion. If you have tabulator as the delimiter for example, then only tabulator positions set in the text viewer should affect the view. Supplementary added spaces will only disturb.

Upvotes: 2

Renis1235
Renis1235

Reputation: 4700

To get the value from the formula field and not the formula itself check the below implementation:

    FormulaEvaluator evaluator = myWorkbook.getCreationHelper().createFormulaEvaluator();

    CellValue cellValue = evaluator.evaluate(cell); // where **cell** is your formula cell

    switch (cellValue.getCellType()) {
        case Cell.CELL_TYPE_BOOLEAN:
            System.out.println(cellValue.getBooleanValue());
            break;
        case Cell.CELL_TYPE_NUMERIC:
            System.out.println(cellValue.getNumberValue());
            break;
        case Cell.CELL_TYPE_STRING:
            System.out.println(cellValue.getStringValue());
            break;
        case Cell.CELL_TYPE_BLANK:
            break;
        case Cell.CELL_TYPE_ERROR:
            break;
    }               
    }

EDIT:

Regarding the alignment issue, check this: How can I pad a String in Java?

Upvotes: 2

Related Questions