maqjav
maqjav

Reputation: 2434

Stream.forEach OutOfMemoryError reading large file

I'm using Java11 and I'm reading a file with around 600MB, where every line has the same length (274 chars).

This is the code I'm using:

Path tempFile;
try (final Stream<String> stream = Files.lines(largeFilePath, StandardCharsets.ISO_8859_1).sorted()) {
    tempFile = Files.createTempFile(null, null);
    stream.forEach(e -> {
        if (StringUtils.startsWith(e, "aa")) {
            try {
                Files.write(tempFile, (e + System.lineSeparator()).getBytes(), StandardOpenOption.APPEND);
            } catch (final IOException e1) {
                throw new RuntimeException(e1);
            }
        }
    });
} catch (final Exception e) {
    throw e;
}

This is the error:

java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringUTF16.compress(StringUTF16.java:160) ~[?:?]
    at java.lang.String.<init>(String.java:3214) ~[?:?]
    at java.lang.String.<init>(String.java:276) ~[?:?]
    at java.io.BufferedReader.readLine(BufferedReader.java:358) ~[?:?]
    at java.io.BufferedReader.readLine(BufferedReader.java:392) ~[?:?]
    at java.nio.file.FileChannelLinesSpliterator.readLine(FileChannelLinesSpliterator.java:171) ~[?:?]
    at java.nio.file.FileChannelLinesSpliterator.forEachRemaining(FileChannelLinesSpliterator.java:113) ~[?:?]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
    at mypackage.MyClass.execute(MyClass.java:103) ~[classes/:?]

The line where it crashes is:

stream.forEach(e -> {

I don't know what I'm missing here... in theory that code should be memory safe, right? If I use a smaller file it works perfectly.

These are my memory settings:

-Xms512m
-Xmx512m
-XX:MaxMetaspaceSize=256m

Upvotes: 0

Views: 290

Answers (2)

Roman Puchkovskiy
Roman Puchkovskiy

Reputation: 11835

You ask the lines to be sorted. This requires ALL of them to be read to memory first, and their total size exceeds the max amount of heap you give to the program.

Either give it more heap, or use something like File Sort (aka External sorting, https://en.wikipedia.org/wiki/External_sorting ).

Upvotes: 2

lavantho0508
lavantho0508

Reputation: 155

The file is too large, this will consume too much resources, you can use the database much easier, I think

Upvotes: 0

Related Questions