Risser
Risser

Reputation: 585

Java substring operation seems to cause Out of Memory Error in java 1.8

This is a distillation of some generated code we have that is causing problems now that we've switched to 1.8.

Can someone help explain why this compiles and runs in Java 1.6, but causes an Out of Memory Error in 1.8? Also, it seems to run fine in 1.8 if you comment out the set.add(s1) line.

I'm pretty sure it's not because I'm storing the 5-character substrings in a set. It ought to be able to handle 12,000 of those. Plus, it works in 1.6, even if I change the line to set.add(new String(s1)) or set.add(s1 + " ") to try and force the creation of new strings.

package put.your.package.here;

import java.util.HashSet;
import java.util.Set;

public class SubstringTest {

    public static void main(String[] args) {
        String s = buildArbitraryString();
        System.out.println(System.getProperty("java.version") + "::" + s.length());
        Set<String> set = new HashSet<String>();
        while (s.length() > 0) {
            s = whackString(s, set);
        }
    }

    private static String whackString(String s, Set<String> set) {
        String s1 = s.substring(0, 5);
        String s2 = s.substring(5);
        s = s2;
        set.add(s1);
        System.out.println(s1 + " :: " + set.size());
        return s;
    }

    private static String buildArbitraryString() {
        StringBuffer sb = new StringBuffer(60000);
        for (int i = 0; i < 15000; i++)
            sb.append(i);
        String s = sb.toString();
        return s;
    }
}

Any ideas?

JVM Version Info:

java.vm.name=IBM J9 VM
java.fullversion=
    JRE 1.8.0 IBM J9 2.8 Windows 7 amd64-64 Compressed References 20160210_289934 (JIT enabled, AOT enabled)
    J9VM - R28_Java8_SR2_20160210_1617_B289934
    JIT  - tr.r14.java_20151209_107110.04
    GC   - R28_Java8_SR2_20160210_1617_B289934_CMPRSS
    J9CL - 20160210_289934

edited to add JVM info

Upvotes: 3

Views: 2011

Answers (2)

Risser
Risser

Reputation: 585

Okay, we've done a lot more digging and we think we've found the problem. In the WAS/IBM Java 1.6 implementation, the substring call looks like this:

return ((beginIndex == 0) && (endIndex == count)) ? this :
    new String(offset + beginIndex, endIndex - beginIndex, value);

We verified this with a debugger. Each new String uses that same main array with different offsets and counts. Works like a charm.

In the WAS/IBM Java 1.8 version we have, the substring call looks like this:

if (!disableCopyInSubstring) {
    return new String (offset + start, end - start, value, false);
} else {
    return new String (offset + start, end - start, value);
}

The disableCopyInSubstring flag is always false, which makes sense. We don't want to disable copying the data into a new array. That copying is supposed to fix the memory leak that reusing the same char array over and over causes. That means substring calls the following constructor (edited for brevity):

if (start == 0) {
    value = data;
} else {
    value = new char[length];
    System.arraycopy(data, start, value, 0, length);
}
offset = 0;
count = length;

So, basically, if the start of the substring is '0', it keeps the entire original char array. For some reason, if the start is '0', it neglects to fix the memory leak. On purpose. It's the worst of both worlds.

So, yeah. In our program, we do the 0-5 substring, and because this implementation doesn't creat a new array when start is 0, it stores the whole giant array with a count length of 5. Then we do the second substring, lopping off the first 5 characters. This does create a new array for the new String. Then, next cycle, we do the short substring again, making a copy of the entire giant string minus five chars, then we lop five more off and make a new string.

Over and over again we go, storing a full copy of the slightly shorter string each time, just chewing up memory.

The solution is to surround the substring(0,5) call with new String(). I did this and it worked like a charm on this test case. But we're dealing with a generated class and we don't have access to the generator, so that's not an option for us.

Edit: Dale found this

/**
 * When the System Property == true, then disable copying in String.substring (int) and
 * String.substring (int, int) methods whenever offset is non-zero. Otherwise, enable copy.
 */
String disableCopyInSubstringProperty = getProperty("java.lang.string.substring.nocopy"); //$NON-NLS-1$
String.disableCopyInSubstring = disableCopyInSubstringProperty != null && 
    disableCopyInSubstringProperty.equalsIgnoreCase("true"); //$NON-NLS-1$

Upvotes: 3

Stefan Mondelaers
Stefan Mondelaers

Reputation: 877

I don't have a complete answer, but I can't comment because I don't have enough credits for that.
You should read the answer in the following post: substring method in String class causes memory leak

It explains that the implementation of substring changed. I think you should check for the impact of the large substrings returned by method wackString and wether garbage collections cleans these up fast enough because these are consuming a lot more memory because of the new implementation of substring.

Upvotes: 0

Related Questions