Rengasami Ramanujam
Rengasami Ramanujam

Reputation: 1878

Parsing Excel file without Apache POI

I know that we can use Apache POI to parse an Excel file and get data. But I heard of a strange thing that excel file can be passed in a similar way we parse CSV (Like just read the file from file Stream and separate each column value with a "comma" separator). When we parse Excel we have to use tab as a delimiter. Is it possible? If yes then why Apache has come up with such a complicated framework. I am puzzled. Can someone help me?

Upvotes: 5

Views: 30337

Answers (5)

Don
Don

Reputation: 1

Sadly it is not comma seperated like a .csv file. But if you just want to grap the table content out of an .xlsx file, without all the fancy ApachePOI stuff. You can use my simple parser:

How to:

  • copy all code into some class.
  • call fromXLSX, where the first argument is the .xlsx file, and the second argument is the name of the excel sheet (leaving it null will select the first sheet by default)
  • it returns an String[#ROWS][#COLUMN] with the contents of the table.

Disclaimer: Only tested for string and number values. Not sure if its working with formulars or dates

public static String[][] fromXLSX(File xlsx, String sheet) {
    FileInputStream fis;
    try {
        fis = new FileInputStream(xlsx);
        ZipInputStream zis = new ZipInputStream(fis);
        ZipEntry ze = zis.getNextEntry();
        
        LinkedList<CellData> cellDatas = null;
        String[] sharedStrings = null;
        
        while(ze != null) {
            String fileName = ze.getName();
            
            if ((sheet == null && fileName.contains("worksheets")) || (sheet != null && fileName.endsWith(sheet + ".xml"))) {
                cellDatas = cellDataFromXLSXextractedXML(readZipEntryToString(zis));
            } else if (fileName.endsWith("sharedStrings.xml")) {
                sharedStrings = sharedStringsFromXML(readZipEntryToString(zis));
            }
            
            
            zis.closeEntry();
            ze = zis.getNextEntry();
        }
        
        zis.closeEntry();
        zis.close();
        fis.close();
        
        int minX = Integer.MAX_VALUE, minY = Integer.MAX_VALUE, maxX = 0, maxY = 0;
        
        for(CellData c : cellDatas) {
            int x = c.getCol();
            int y = c.getRow();
            
            if (x < minX)
                minX = x;
            if (x > maxX)
                maxX = x;
            if (y < minY)
                minY = y;
            if (y > maxY)
                maxY = y;
            
            //replace String values
            if(c.string) {
                c.value = sharedStrings[c.getValueInt()];
            }
        }
        int w = maxX - minX + 1;
        int h = maxY - minY + 1;
        
        String[][] values = new String[h][w];
        
        for(CellData c : cellDatas) {
            values[c.getRow() - minY][c.getCol() - minX] = c.value;
        }
        
        return values;
    
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    
    return null;
}

private static String readZipEntryToString(ZipInputStream zis) throws IOException {
    byte[] buffer = new byte[1024];
    int len;
    
    StringBuilder xml = new StringBuilder();
    while ((len = zis.read(buffer)) > 0) {
        String str = new String(buffer, 0, len);
        xml.append(str);
    }
    return xml.toString();
}

private static String[] sharedStringsFromXML(String xml) {
    LinkedList<String> res = new LinkedList<>();
    
    int index = 0;
    while((index = xml.indexOf("<t>", index)) != -1) {
        
        int end = xml.indexOf("</t>", index);
        res.add(xml.substring(index + 3, end));
        index = end;
    }
    return (String[]) res.toArray(new String[res.size()]);
}

private static LinkedList<CellData> cellDataFromXLSXextractedXML(String sheetXml) {
    String sheetData = sheetXml.substring(sheetXml.indexOf("<sheetData>")  + "<sheetData>".length(), sheetXml.indexOf("</sheetData>"));
    
    LinkedList<CellData> cellData = new LinkedList<>();
    
    int index = 0;
    while((index = sheetData.indexOf("<c r=\"", index)) != -1) {
        String cellID = sheetData.substring(index + 6, sheetData.indexOf('"', index + 6));
        int endOfTag = sheetData.indexOf('>', index);
        int typeString = sheetData.indexOf("t=\"s\"", index);
        
        boolean isStr = typeString != -1 && typeString < endOfTag;
        int valOpen = sheetData.indexOf("<v>", endOfTag);
        int valClose = sheetData.indexOf("</v>", valOpen);
        
        String value = sheetData.substring(valOpen + 3, valClose);
        
        cellData.add(new CellData(cellID, value, isStr));
        index = endOfTag;
    }
    
    return cellData;
}

private static class CellData {
    public String cellID;
    public String value;
    public boolean string;
    
    public CellData(String cellID, String value, boolean string) {
        this.cellID = cellID;
        this.value = value;
        this.string = string;
    }
    
    private int getCol() {
        String rev_id = new StringBuilder(cellID).reverse().toString();
        int pos = 0;
        int sum = 0;
        while(Character.isDigit(rev_id.charAt(pos))) {
            pos++;
        }
        int offs = pos;
        
        while(pos < rev_id.length()) {
            sum += (((int)rev_id.charAt(pos)) - ((int)'A')) * (powInt(26, pos - offs));
            pos++;
        }
        return sum;
    }
    
    private int getRow() {
        int pos = 0;
        while(!Character.isDigit(cellID.charAt(pos))) {
            pos++;
        }
        return Integer.parseInt(cellID.substring(pos));
    }
    
    public int getValueInt() {
        return Integer.parseInt(value);
    }
    
    private int powInt(int base, int to) {
        int res = 1;
        for(int i = 0; i < to; i++) {
            res *= base;
        }
        return res;
    }
    
    @Override
    public String toString() {
        return "Cell: [" + getCol() + ", " + getRow() + "]: " + value + " (str: " +string + ")";
    }
}

Upvotes: 0

Baisakha Chauhan
Baisakha Chauhan

Reputation: 31

InputStream is = new FileInputStream(new File(filepath));
        StreamingReader reader=null;
        try {
            reader = StreamingReader.builder()
                    .rowCacheSize(100)     
                    .bufferSize(4096)     
                    .sheetIndex(0)        
                    .read(is);
        } catch (Exception e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }finally{
            is.close();
        }
        //pass here to reader and itrate it 
          for (Row row : reader) {
            if (row.getRowNum()!=0){
                for (Cell cell : row) {
              // write ur logic to store ur value 
                }

            }
        }

Upvotes: 0

Shahrukh A.
Shahrukh A.

Reputation: 1101

i tried to read/write excel file without using any external JAR like POI or any other. I am able to write file as xls format. Here is my Code

FileWriter fwriter = new FileWriter(file,true);
writer = new BufferedWriter(fwriter);
writer.newLine();
writer.write("a"    + "\t");
writer.write("b"    + "\t");
writer.write("c"    + "\t");
writer.write("d"    + "\t");
writer.write("e"    + "\t");
writer.write("f"    + "\t");

Reading File here is my code for reading

if(file != null) {
            BufferedReader reader = null;
            try {
                reader = new BufferedReader(new FileReader(file));
                String line;
                while((line = reader.readLine()) != null) {
                    String[] component = line.split("\\t");
                }
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if(reader != null) {
                    try {
                        reader.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }

Upvotes: 0

maksimov
maksimov

Reputation: 5811

CSV is a text format, so it can be parsed using the delimiters. Old Excel is a binary and proprietary format so it needs clever decoding. The new Excel format is zipped XMLs, but one should also understand the structure of this document before it could be transformed into something as simple as reading cells one by one. So the answer to your question is no, you'll need to use Apache POI, and also - there's nothing wrong with that.

As a side note, on the path to become a good developer you will need to learn to do a bit of your own research before looking for help. Get your hands dirty is the best way to learn things.

Upvotes: 5

Michael Berry
Michael Berry

Reputation: 72254

You've probably confused what you've heard, or the person telling you was confused.

Some parts of Excel files can be stored (somewhat) as CSV files as the tabular data structure fits well within a CSV file format. However, if you save in CSV format then you just get plain text in each cell - you lose all formatting information, any graphs, multiple worksheets and so on.

The native XLS excel format is what Apache POI works with, and so can handle everything in excel, not just restrictive plain text in certain cells. CSV files have their uses but they're definitely not a straight replacement for normal Excel files.

Upvotes: 2

Related Questions