Sangeeta Rawat
Sangeeta Rawat

Reputation: 309

Java Heap Error in Talend

I am trying to read a 30-60 MB of excel file in talend, but it is giving following error

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.<init>(Unknown Source)
at java.lang.StringBuffer.<init>(Unknown Source)
at org.apache.xmlbeans.impl.store.Locale$ScrubBuffer.<init>(Locale.java:1804)
at org.apache.xmlbeans.impl.store.Locale.getScrubBuffer(Locale.java:1904)
at org.apache.xmlbeans.impl.store.Xobj.getValueAsString(Xobj.java:1205)
at org.apache.xmlbeans.impl.store.Xobj.fetch_text(Xobj.java:1796)
at org.apache.xmlbeans.impl.values.XmlObjectBase.get_wscanon_text(XmlObjectBase.java:1332)
at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1269)
at org.apache.xmlbeans.impl.values.JavaLongHolder.longValue(JavaLongHolder.java:53)
at org.apache.xmlbeans.impl.values.XmlObjectBase.getLongValue(XmlObjectBase.java:1502)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTRowImpl.getR(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFRow.getRowNum(XSSFRow.java:322)
at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:229)
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:193)
at    org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:180)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:300)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:221)
at de.cimt.talendcomp.tfileexcelpoi.SpreadsheetFile.initializeWorkbook(SpreadsheetFile.java:281)
at cdif_validation.validation_partsourcingtab3_0_1.Validation_PartSourcingTab3.tFileExcelWorkbookOpen_1Process(Validation_PartSourcingTab3.java:774)
at cdif_validation.validation_partsourcingtab3_0_1.Validation_PartSourcingTab3.tFileList_1Process(Validation_PartSourcingTab3.java:696)
at cdif_validation.validation_partsourcingtab3_0_1.Validation_PartSourcingTab3.runJobInTOS(Validation_PartSourcingTab3.java:8268)
at cdif_validation.validation_partsourcingtab3_0_1.Validation_PartSourcingTab3.runJob(Validation_PartSourcingTab3.java:8110)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tRunJob_3Process(CDIF_Validation.java:1175)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tForeach_1Process(CDIF_Validation.java:983)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tFileList_1Process(CDIF_Validation.java:820)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tJava_1Process(CDIF_Validation.java:634)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tPrejob_1Process(CDIF_Validation.java:529)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.runJobInTOS(CDIF_Validation.java:1885)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.main(CDIF_Validation.java:1722)

I have tried increasing xms upto 1 GB and xmx size upto 6 GB. also I tried using disk to store temp data. How to resolve this problem??

Upvotes: 2

Views: 1960

Answers (2)

Balazs Gunics
Balazs Gunics

Reputation: 2077

The default talend components use way more memory than required. I believe it is because that 60MB xlsx file is entirely uncompressed in memory. Even if you're just reading 1 row from 1 tab.

To overcome on this limitation you can try to add more and more memory or look for a different way. I usually use the tFileExcel components from Jan Lolling. You can obtain these from the talend marketplace.

Upvotes: 0

edasssus
edasssus

Reputation: 351

Had same problem with Apache POI. Its because POI has large memory footprint. Now you can try to do some work-aroud like process your file by smaller parts or try to look at this and do some "optimization".

Also look Apache POI FAQ to point 14. to get some recommendations.

Upvotes: 1

Related Questions