A T
A T

Reputation: 19

How to read file of size more than 100 MB efficiently in Java

Hey I am using Java 8 in my spring boot application. I am getting data from apache server through an HTTP call. Data size sometime is more than 100 MB (more than 100 000 lines). I need to convert that data into list of string. For smaller data below code is working but for larger it is giving

java.lang.OutOfMemoryError: Java heap space

This is how i am converting the data into List of string.

List<String> lines = null;
try {
    String data = con.sendGet();

    
    if(data==null)
        throw new UserAuthException("diff is not available at the location");

    else {
        diff.setLineAsString(data);
        lines = IOUtils.readLines(new StringReader(data));
        System.out.println("lines = IOUtils.readLines(new StringReader(data));");
    }
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}  

Does IOUtils run in memory? what is the efficient way to read it?

Upvotes: 0

Views: 2200

Answers (2)

Kunal Vohra
Kunal Vohra

Reputation: 2846

Lets go in utmost details.

Reading in Memory

The standard way of reading the lines of the file is in memory – both Guava and Apache Commons IO provide a quick way to do just that:

Files.readLines(new File(path), Charsets.UTF_8);

FileUtils.readLines(new File(path));

The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.

For example – reading a ~1Gb file:

@Test
public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
    String path = ...
    Files.readLines(new File(path), Charsets.UTF_8);
}

This starts off with a small amount of memory being consumed: (~0 Mb consumed)

[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb [main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, after the full file has been processed, we have at the end: (~2 Gb consumed)

[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb [main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

It should be obvious by this point that keeping in memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.

What's more, we usually don't need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we're going to do – iterate through the lines without holding all of them in memory.

Streaming Through the File

Let's now look at a solution – we're going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // System.out.println(line);
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory: (~150 Mb consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

Streaming With Apache Commons IO

The same can be achieved using the Commons IO library as well, by using the custom LineIterator provided by the library:

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
    }
} finally {
    LineIterator.closeQuietly(it);
}

Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers: (~150 Mb consumed)

[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb [main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb

Code snippet available here

Upvotes: 2

Sabareesh Muralidharan
Sabareesh Muralidharan

Reputation: 647

Usually while consuming a sufficiently large file, you end up with an OutOfMemoryError.

If you wanted to convert larger files into List<String>, there are various approaches,

1. Loading a Binary File in Chunks

try(BufferedInputStream in = new BufferedInputStream(new FileInputStream(pathname)))
{
    byte[] bbuf = new byte[4096];
    int len;
    while ((len = in.read(bbuf)) != -1) {
        // process data here: bbuf[0] thru bbuf[len - 1]
    }
}

Through readLine

2. Reading a Text File Line By Line

try(BufferedReader in = new BufferedReader(new FileReader(pathname))) {
    String line;
    while ((line = in.readLine()) != null) {
        // process line here.
    }
}

Through Scanner,

3. Using a Scanner
try(Scanner scanner = new Scanner(new File(pathname))) {
    while ( scanner.hasNextLine() ) {
        String line = scanner.nextLine();
        // process line here.
    }
}

Through Streams,

4. With Java 8 Streams

List<String> alist = Files.lines(Paths.get(pathname))
    .collect(Collectors.toList());

You can also refer this LINK

Upvotes: 1

Related Questions