Greg Fullard
Greg Fullard

Reputation: 385

Apache POI How to add a custom DataFormatter for handling 13 digit integers as strings, not numbers

I'm building a XLSX processor that transforms a XLSX into a CSV file. Because the files can get quite big, I'm using the event-based approach using XSSFSheetXMLHandler

This works perfectly fine, but my XLSX files contains long numbers (13 digits) which are unique identification numbers, not real numbers. When running my code on a Windows machine it correctly extracts the numbers, but when running on a Linux machine it converts it to E-notation.

For example: the source value is 7401075293087. On windows this is correctly extracted into my CSV, but on Linux the value comes through as 7.40108E+12

The problem with the XSSFSheetXMLHandler is that it reads the XLSX under the covers and then throws events that are caught by a SheetContentsHandler that you need to implement. Once of the method in the SheetContentsHandler is a cell method with the signature: cell(String cellReference, String formattedValue, XSSFComment comment)

As your can see, this method already received the formatted cell (so in my case it receives "7.40108E+12"). All the rest of the logic happens under the covers.

Based on my investigations I believe the solution lies in defining a custom DataFormatter that will specifically treat 13 digit integers as a string, instead of formatting them as E-notation.

Unfortunately my plan didn't work as expected and I couldn't find an help online. Below is an extract of my code. I tried the following in the processSheet method:

     Locale locale = new Locale.Builder().setLanguage("en").setRegion("ZA").build(); 
     DataFormatter formatter = new DataFormatter(locale);
     Format format = new MessageFormat("{0,number,full}");
     formatter.addFormat("#############", format);

Here's an extract of my code:

The main body of the code:

 public void process(String Filename)throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {
     ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
     XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
     StylesTable styles = xssfReader.getStylesTable();
     XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
     while (iter.hasNext()) {
          InputStream stream = iter.next();
          String sheetName = iter.getSheetName();
          outStream = new FileOutputStream(Filename);
          logger.info(sheetName);
          this.output = new  PrintWriter(Filename);
          processSheet(styles, strings, new SheetToCSV(), stream);
          logger.info("Done with Sheet   :"+sheetName);
          output.flush();
          stream.close();
          outStream.close();
          output.close();
         ++index; 
     }
 } 

 public void processSheet(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetHandler, InputStream sheetInputStream)
         throws IOException, ParserConfigurationException, SAXException {

     InputSource sheetSource = new InputSource(sheetInputStream);
     try {
         XMLReader sheetParser = SAXHelper.newXMLReader();
         ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);
         sheetParser.setContentHandler(handler);
         sheetParser.parse(sheetSource);
      } catch(ParserConfigurationException e) {
         throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
      }
 }

And here's the custom handler:

private class SheetToCSV implements SheetContentsHandler {
         private boolean firstCellOfRow = false;
         private int currentRow = -1;
         private int currentCol = -1;

     private void outputMissingRows(int number) {

         for (int i=0; i<number; i++) {
             for (int j=0; j<minColumns; j++) {
                 output.append(',');
             }
             output.append('\n');
         }
     }

     public void startRow(int rowNum) {
         // If there were gaps, output the missing rows
         outputMissingRows(rowNum-currentRow-1);
         // Prepare for this row
         firstCellOfRow = true;
         currentRow = rowNum;
         currentCol = -1;
     }

     public void endRow(int rowNum) {
         // Ensure the minimum number of columns
         for (int i=currentCol; i<minColumns; i++) {
             output.append(',');
         }
         output.append('\n');
     }

     public void cell(String cellReference, String formattedValue,
             XSSFComment comment) {
         logger.info("CellRef :: Formatted Value   :"+cellReference+" :: "+formattedValue);              
         if (firstCellOfRow) {
             firstCellOfRow = false;
         } else {
             output.append(',');
         }

         // gracefully handle missing CellRef here in a similar way as XSSFCell does
         if(cellReference == null) {
             cellReference = new CellRangeAddress(currentRow, currentCol, currentCol, currentCol).formatAsString();
         }

         // Did we miss any cells?
         int thisCol = (new CellReference(cellReference)).getCol();
         int missedCols = thisCol - currentCol - 1;
         for (int i=0; i<missedCols; i++) {
             output.append(',');
         }
         currentCol = thisCol;

         // Number or string?
         try {
             Double.parseDouble(formattedValue);
             output.append(formattedValue);
         } catch (NumberFormatException e) {
             //formattedValue = formattedValue.replaceAll("\\t", "");
             //formattedValue = formattedValue.replaceAll("\\n", "");
             //formattedValue = formattedValue.trim();
             output.append('"');
             output.append(formattedValue.replace("\"", "\\\"").trim());
             output.append('"');
         }
     }

     public void headerFooter(String text, boolean isHeader, String tagName) {
         // Skip, no headers or footers in CSV
     }

    @Override
    public void ovveriddenFormat(String celRef, int formatIndex,
            String formatedString) {
        // TODO Auto-generated method stub

    }

 }

Upvotes: 0

Views: 1981

Answers (2)

Axel Richter
Axel Richter

Reputation: 61852

Cannot reproducing if the file is generated using Excel and the cells containing the 13 digit numbers are formatted using number format 0 or #, not General.

But what is meant with "running on a Linux machine"? If I am creating the *.xlsx file using Libreoffice Calc having the cells containing the 13 digit numbers formatted using number format General, then Calc will showing them as 13 digit numbers but Excel will not. For showing the numbers 13 digit in Excel the cells must be formatted using number format 0 or #.

The apache poi DataFormatter is made to work like Excel would do. And Excel shows values from 12 digits on as scientific notation when formatted using General.

You could changing this behavior using:

...
    public void processSheet(
            StylesTable styles,
            ReadOnlySharedStringsTable strings,
            SheetContentsHandler sheetHandler, 
            InputStream sheetInputStream) throws IOException, SAXException {
        DataFormatter formatter = new DataFormatter();
        formatter.addFormat("General", new java.text.DecimalFormat("#.###############"));
...

Upvotes: 3

Luigi D&#39;Amico
Luigi D&#39;Amico

Reputation: 701

DZONE wrote a cracking article on this: https://dzone.com/articles/simple-string-representation-of-java-decimal-numbe

Another answer from StackOverflow is:

Row row = sheet.getRow(0);
Object o = getCellValue(row.getCell(0));
System.out.println(new BigDecimal(o.toString()).toPlainString());

REF: Apache POI DataFormatter Returns Scientific Notation

I didn't test your actual problem on a linux machine.. however I hope this provides some answers in the midst of the night!

Upvotes: 1

Related Questions