Radheya
Radheya

Reputation: 821

Error: GC overhead limit exceeded in XSSFWorkbook

I am working with Java Apache POI library and dealing with huge excel sheets. approx 10 mb of data with lots of rows and columns. There are also 8-10 different sheets in one excel file. The data is not in rich text format, but full of internal functions and formulas eg. = SUM(A2:A4) and so on which I don't have any concerns with.

This image is just for illustration purpose. functions in actual data are way different and very complex:

enter image description here

The data includes Strings, Numbers and Boolean values. My concern is only make XSSF read values as normal text excluding all the formulas or functions that are applied in excel. So to say, in above image I only want to read values in rows and columns i.e. 10,20,30 etc, Numbers, Total

Problem

If I format excel sheets and remove all formulas and functions and save data in simple rich text format, my code runs. However, when I don't modify excel files and keep data as shown in above format I run into GC overhead limit exceeded error.

What I want

I just want to read excel files full of formulas and functions just as they are. My algorithm works when I remove all the formulas and keep text in sheets as normal rich text format.

What I tried

As mentioned in other resources online and on stackoverflow, I tried 1st approach as given in below code:

fis = new FileInputStream(path);
opc = OPCPackage.open(fis);  
XSSFWorkbook workbook = new XSSFWorkbook(opc);

Rather than using simply FileInputStream for input I first passed it through OPCPackage. Still it shows same error and code wont execute below XSSFWorkbook workbook

I then used 2nd approach with XSSFReader. Below is the code:

    xssfReader = new XSSFReader(opc);
    SharedStringsTable sst = xssfReader.getSharedStringsTable();
    XSSFReader.SheetIterator itr = (XSSFReader.SheetIterator)xssfReader.getSheetsData();                

    while(itr.hasNext()) {
            InputStream sheetStream = itr.next();
            if(itr.getSheetName().equals(sheetName)) {

              // no idea how to extract sheet like I would do in XSSFWorkbook
              // I only get Sheet name of desired sheet

    } // while ends here

Nothing so far works for me and if I use XSSFWorkbook, it will throw GC overhead limit exceeded error. So currently I am manually removing all formulas and functions and then algorithm works but its not efficient way to deal with the problem. Any help or suggestions are appreciated.

EDIT:

As pointed in link here I tried allocating more memory, but its still not working out. Below are some snapshots of me trying to allocate more memory.

enter image description here enter image description here

If I am doing something wrong in allocating memory, let me know. I will do the needed change.

New Edit

I have solved my problem as mentioned in centic comment below by adding -Xmx8192m to my run configurations in eclipse. I am now looking into other ways of solving memory issue by using SXSSFWorkbook as already discussed in answer below.

Upvotes: 2

Views: 5717

Answers (2)

centic
centic

Reputation: 15872

Post comment as answer:

The memory settings you show are for Eclipse IDE and Java Webstart, how are you actually starting your application? If as application or unit test inside Eclipse, then you need to adjust memory settings in the run configuration instead to actually apply them when your own code is running.

Upvotes: 1

tomgeraghty3
tomgeraghty3

Reputation: 1254

Have you tried opening the file as SXSSF workbook instead of a XSSF workbook?

fis = new FileInputStream(path);
opc = OPCPackage.open(fis); 
XSSFWorkbook workbook = new XSSFWorkbook(opc);
SXSSFWorkbook wb = new SXSSFWorkbook(workbook);

See https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html. Taken directy from their JavaDoc: "This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time"

Upvotes: 2

Related Questions