Ivan T
Ivan T

Reputation: 1046

How to read integers from a file when performance is a concern?

I'm doing some tasks on CodeEval. Basically the task is very simple: "Print out the sum of all the integers read from the file".

My solution is following:

import java.io.File;
import java.io.IOException;
import java.io.BufferedReader;
import java.io.FileReader;

public class SumIntegersFromFile {

    public static void main(String args[]) throws IOException{

        File file = new File(args[0]);
         BufferedReader br = new BufferedReader( new FileReader(file));
         String line;
         int i=0;
         while((line=br.readLine())!=null){
            int k = Integer.parseInt(line);
             i+=k;
         }
         br.close();
         System.out.println(i);
    }
}

But I was told this solution is not optimal from a performance point of view.

The code is based on the recommendations in the question Best way to read a text file. The only difference here is I am reading integers instead of strings.

What is the most performance-efficient way to read integers from a file in Java?

Upvotes: 3

Views: 434

Answers (2)

Raedwald
Raedwald

Reputation: 48712

I see nothing wrong with the performance of your code. That is, I dispute the claim that your program has anything wrong with it.

Reading data from files, or across the network, is several orders of magnitude slower than manipulating data in memory. The performance of code that mixes I/O with some manipulation of data in memory is therefore typically dominated by the time taken for the I/O. Tweaks to the manipulation of data in memory are rarely worthwhile. If I/O operations happen in parallel with data manipulation (which will be the case if the O/S does some read-ahead), the data manipulation can be almost free: making the data manipulation faster will not decrease the time taken because any decreases in the CPU time for the data manipulation will be precisely offset by an increase in the amount of time that the program blocks while awaiting input.

Programs that do I/O and need good performance must decrease the amount of time they spend blocked awaiting I/O. They should operate in a manner that enables them to take advantage of the optimizations that the hardware and operating system provide to reduce the amount of blocking.

Importantly, at the low level, disks and networks do not operate on small numbers of bytes for each operation. They use larger units of packets or blocks. Interacting with the operating system to read fewer bytes than are stored in one disk block is wasteful. Programs avoid doing that by buffering their I/O, so the program itself changes a sequence of many small I/O operations into fewer but larger operations. You are using a BufferedReader, so you are already doing that.

The operating system is likely to do some read-ahead: if you ask for bytes in a block at the start of a file it will guess that you are probably going to read the file sequentially, so it would be worthwhile for it to also fetch some of the subsequent blocks of the file, in anticipation of your program also needing those. Reading files sequentially gives better performance. You are already doing that.

Upvotes: 1

chiastic-security
chiastic-security

Reputation: 20520

Unless you've been explicitly told otherwise, you shouldn't assume that the total will fit in an int. Try changing the type of i to a long, or even a BigInteger, and see if that makes a difference to your score.

You might try doing the same with k (and using Long.parseLong(line)). It will depend on the exact wording of the question, but perhaps the individual values could exceed the limits of an int too.

One more thing... the question, as you've phrased it, just says that you should sum all the integers. That leaves open the possibility that there will be lines that aren't integers, in which case you should skip them, rather than throwing a NumberFormatException (which is what your code will do at the moment).

(And presumably you've been told that it's one entry per line...)

But if you want to squeeze every last bit of performance out, you need to read the file as binary rather than line by line: turning each line into a String is just too expensive. A detailed account of how to do it can be found in this question on summing integers from a text file.

Upvotes: 1

Related Questions