Abhishek Singh
Abhishek Singh

Reputation: 9765

Excel sheet POI validation: Out Of Memory Error

I am trying to validate an Excel file using Java before dumping it to database.

Here is my code snippet which causes error.

try {
    fis = new FileInputStream(file);
    wb = new XSSFWorkbook(fis);
    XSSFSheet sh = wb.getSheet("Sheet1");
    for(int i = 0 ; i < 44 ; i++){
        XSSFCell a1 = sh.getRow(1).getCell(i);
        printXSSFCellType(a1);
    }
    
} catch (FileNotFoundException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

Here is the error I get

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.ArrayList.<init>(Unknown Source)
    at java.util.ArrayList.<init>(Unknown Source)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:78)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:75)
    at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:98)
    at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:106)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1273)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.stringValue(XmlObjectBase.java:1484)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.getStringValue(XmlObjectBase.java:1492)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getR(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFCell.<init>(XSSFCell.java:105)
    at org.apache.poi.xssf.usermodel.XSSFRow.<init>(XSSFRow.java:70)
    at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:179)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:143)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:130)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:286)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:207)
    at com.xls.validate.ExcelValidator.main(ExcelValidator.java:79)

This works perfectly fine when the .xlsx file is less than 1 MB.

I understand this is because my .xlsx file is around 5-10 MB and POI tries to load the entire sheet at once in JVM memory.

What can be a possible workaround?

Upvotes: 14

Views: 40290

Answers (8)

Sandeep Jain
Sandeep Jain

Reputation: 1261

For resolving error of out of memory : heap space

Need to increase Heap space

On a 32-bit JVM, the largest heap size you can theoretically set is 4gb. To use a larger heap size, you need to use a 64-bit JVM. Try the following Run as application

java -Xmx512 -d32 abcd.jar or java -Xmx6144m -d64 abcd.jar

The -d64 flag is important as this tells the JVM to run in 64-bit mode.

Upvotes: 0

To resolve Outofmemery error, follow this.

You can not modify existing cells in a SXSSFWorkbook, but you can create the new file along with your modification using SXSSFWorkbook.

It's possible by passing the workbook object along with rowaccesswindow size.

SXSSFWorkbook workbook = new SXSSFWorkbook(  new XSSFWorkbook(new FileInputStream(file)),100);
//Your changes in workbook
workbook.write(out);

Upvotes: 0

To resolve Outofmemery error follow this.

You can not modify existing cells in a SXSSFWorkbook but you can create the new file along with your modification using SXSSFWorkbook.

It's possible by passing the workbook object along with rowaccesswindow size.

SXSSFWorkbook workbook = new SXSSFWorkbook(  new XSSFWorkbook(new FileInputStream(file)),100);

//Your changes in workbook

workbook.write(out);

Upvotes: 0

Gautam Tadigoppula
Gautam Tadigoppula

Reputation: 1032

You can use SXSSF workbook from POI for memory related issues. Refer here

I faced the similar issue while reading and merging multiple CSVs into a single XLSX file. I had a total of 3 csv sheets each with 30k rows totalling to 90k.

It got resolved by using SXSFF as below,

    public static void mergeCSVsToXLSX(Long jobExecutionId, Map<String, String> csvSheetNameAndFile, String xlsxFile) {
    try (SXSSFWorkbook wb = new SXSSFWorkbook(100);) { // keep 100 rows in memory, exceeding rows will be flushed to
                                                       // disk
      csvSheetNameAndFile.forEach((sheetName, csv) -> {
        try (CSVReader reader = new CSVReader(new FileReader(csv))) {
          wb.setCompressTempFiles(true);
          SXSSFSheet sheet = wb.createSheet(sheetName);
          sheet.setRandomAccessWindowSize(100);

          String[] nextLine;
          int r = 0;
          while ((nextLine = reader.readNext()) != null) {
            Row row = sheet.createRow((short) r++);
            for (int i = 0; i < nextLine.length; i++) {
              Cell cell = row.createCell(i);
              cell.setCellValue(nextLine[i]);
            }
          }
        } catch (IOException ioException) {
          logger.error("Error in reading CSV file {} for jobId {} with exception {}", csv, jobExecutionId,
              ioException.getMessage());
        }
      });

      FileOutputStream out = new FileOutputStream(xlsxFile);
      wb.write(out);
      wb.dispose();
    } catch (IOException ioException) {
      logger.error("Error in creating workbook for jobId {} with exception {}", jobExecutionId,
          ioException.getMessage());
    }
  }

Upvotes: 3

user2615534
user2615534

Reputation:

Well, here's a link with some detailed info about your error, and how to fix it: http://javarevisited.blogspot.com/2011/09/javalangoutofmemoryerror-permgen-space.html?m=1.

Well, let me try to explain your error:

The java.lang.OutOfMemoryError has two variants. One in the Java Heap Space, and the other in PermGen Space.

Your error could be caused by a memory leak, a low amount of system RAM, or very little RAM allocated to the Java Virtual Machine.

The difference between the Java Heap Space and PermGen Space variants is that PermGen Space stores pools of Strings and data on the primitive types, such as int, as well as how to read methods and classes, the Java Heap Space works differently. So if you have a lot of strings or classes in your project, and not enough allocated/system RAM, you will get an OutOfMemoryError. The default amount of RAM the JVM allocates to PermGen is 64 MB, which is quite a small bit of memory space. The linked article explains much more about this error and provides detailed information about how to fix this.

Hope this helps!

Upvotes: 0

Meer Nasirudeen
Meer Nasirudeen

Reputation: 7

I too faced the same issue of OOM while parsing xlsx file...after two days of struggle, I finally found out the below code that was really perfect;

This code is based on sjxlsx. It reads the xlsx and stores in a HSSF sheet.

           [code=java] 
            // read the xlsx file
       SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));

        HSSFWorkbook hsfWorkbook = new HSSFWorkbook();

        org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();

        Sheet sheetToRead = workbook.getSheet(0, false);

        SheetRowReader reader = sheetToRead.newReader();
        Cell[] row;
        int rowPos = 0;
        while ((row = reader.readRow()) != null) {
            org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
            int cellPos = 0;
            for (Cell cell : row) {
                if(cell != null){
                    org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
                    hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
                    hfsCell.setCellValue(cell.getValue());
                }
                cellPos++;
            }
            rowPos++;
        }
        return hsfSheet;[/code]

Upvotes: -1

Gagravarr
Gagravarr

Reputation: 48366

There are two options available to you. Option #1 - increase the size of your JVM Heap, so that Java has more memory available to it. Processing Excel files in POI using the UserModel code is DOM based, so the whole file (including parsed form) needs to be buffered into memory. Try a question like this one for advice on how to increase the help.

Option #2, which is more work - switch to event based (SAX) processing. This only processes part of the file at a time, so needs much much less memory. However, it requires more work from you, which is why you might be better throwing a few more GB of memory at the problem - memory is cheap while programmers aren't! The SpreadSheet howto page has instructions on how to do SAX parsing of .xlsx files, and there are various example files provided by POI you can look at for advice.

.

Also, another thing - you seem to be loading a File via a stream, which is bad as it means even more stuff needs buffering into memory. See the POI Documentation for more on this, including instructions on how to work with the File directly.

Upvotes: 14

Luca Basso Ricci
Luca Basso Ricci

Reputation: 18413

Use Event API (HSSF Only).

The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS with a relatively small memory footprint.

Upvotes: 1

Related Questions