Thomas
Thomas

Reputation: 2070

SAX Parser - OutOfMemoryError: Java heap space

I need SAX parsing because I want to check for maliciously malformed XML. It's the first time I'm using this library.

I created an XML file (18MB) which contains an attribute with a very, very long name.

    <?xml version="1.0"?>
    <company>
        <staff>
            <firstname VERYLONGATTRIBUTENAME...VERYLONGATTRIBUTENAME="some value"> 
yong</firstname>
        <lastname>mook kim</lastname>
        <nickname>mkyong</nickname>
        <salary>100000</salary>
    </staff>
    <staff>
        <firstname>low</firstname>
        <lastname>yin fong</lastname>
        <nickname>fong fong</nickname>
        <salary>200000</salary>
    </staff>
</company>

I just call the SAXParser like this

saxParser.parse("test.xml", handler);

All of the event handlers are completely empty. But an OutOfMemoryError: Java heap space occurs. Why does this happen? I choose SAX because it was stream/event based and wouldn't have problems handeling this type of problems (compared to DOM).

EDIT: I increased the length of attribute name by doubling it every time. It worked until I reached this 18MB file.

EDIT 2: Stack trace

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at java.lang.StringValue.from(StringValue.java:24)
    at java.lang.String.<init>(String.java:178)
    at com.sun.org.apache.xerces.internal.util.SymbolTable$Entry.<init>(SymbolTable.java:338)
    at com.sun.org.apache.xerces.internal.util.SymbolTable.addSymbol(SymbolTable.java:178)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanName(XMLEntityScanner.java:726)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1523)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1320)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2756)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:277)
    at com.thundercloud.httpfilter.XMLParser.test(XMLParser.java:150)
    at com.thundercloud.httpfilter.HTTPInterceptor.main(HTTPInterceptor.java:34)

Thanks in advance

Upvotes: 1

Views: 4048

Answers (3)

whummer
whummer

Reputation: 497

You may want to check out ScaleDOM, which allows to parse very large XML files: https://github.com/whummer/scaleDOM

ScaleDOM has a small memory footprint due to lazy loading of XML nodes. It only keeps a portion of the XML document in memory and re-loads nodes from the source file when necessary.

Upvotes: 0

Lan
Lan

Reputation: 6660

You can find your memory settings in Eclipse Run->Run Configuration. Look for Java application and find the name of the class you try to run, select it, click the Arguments tab. What is the setting in the VM Arguments section? If it is empty, please add the below value to the to the VM Arguments section.

-Xms512M -Xmx1024M

Also, there is a bug relating to JDK6 regarding SAX parser throws OutOfMemoryError. The affected version is JDK6 before update 14. Please check your Java version to make sure it does not apply to you.

Edit: based on the comment, I modify my answer and suggest to add the below VM setting in the VM arguments section

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="c:\temp\oomdump.hprof".

Then you can use tools like Eclipse MAT http://www.eclipse.org/mat/ to analyze the dump file to see what is really the issue

Upvotes: 1

Kishore
Kishore

Reputation: 839

First of all, I don't think any attribute name will be that long. Try increasing the heap size, and then check.

java -jar -Xms<min_size> -Xmx<max_size> <ur_jar>

Upvotes: 0

Related Questions