mclopez
mclopez

Reputation: 151

Fastest way to load huge text file into a int array

I have a big text file (+100MB), each line being an integer number (containing 10 million numbers). Of course, the size and amount may change, so I don't know this in advance.

I want to load the file into a int[], making the process as fast as posible. First I came to this solution:

public int[] fileToArray(String fileName) throws IOException
{
    List<String> list = Files.readAllLines(Paths.get(fileName));
    int[] res = new int[list.size()];
    int pos = 0;
    for (String line: list)
    {
        res[pos++] = Integer.parseInt(line);
    }
    return res;
}

It was pretty fast, 5.5 seconds. Of which, 5.1s goes for the readAllLines call, and 0.4s for the loop.

But then I decided to try using BufferedReader, and came to this different solution:

public int[] fileToArray(String fileName) throws IOException
{
    BufferedReader bufferedReader = new BufferedReader(new FileReader(new File(fileName)));
    ArrayList<Integer> ints = new ArrayList<Integer>();
    String line;
    while ((line = bufferedReader.readLine()) != null)
    {
        ints.add(Integer.parseInt(line));
    }
    bufferedReader.close();

    int[] res = new int[ints.size()];
    int pos = 0;
    for (Integer i: ints)
    {
        res[pos++] = i.intValue();
    }
    return res;
}

This was even faster! 3.1 seconds, just 3s for the while loop and not even 0.1s for the for loop.

I know there is no much space here for optimization, at least in time, but using an ArrayList and then a int[] seems like too much memory to me.

Any ideas on how to make this faster, or avoid using the middle ArrayList?

Just for comparison, I do this same task with FreePascal in 1.9 seconds [see edit], using TStringList class and StrToInt function.

EDIT: Since I got a pretty short time with Java method, I had to improve the FreePascal one. 330~360ms.

Upvotes: 4

Views: 1279

Answers (1)

castletheperson
castletheperson

Reputation: 33466

If you're using Java 8, you can eliminate this middle ArrayList by using lines() and then mapping to an int, then collecting the values into an array.

You should also be using try-with-resources for proper exception handling and auto-closing.

try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
    return br.lines()
             .mapToInt(Integer::parseInt)
             .toArray();
}

I'm not sure if this is faster, but it is certainly much easier to maintain.

Edit: It is apparently MUCH faster.

Upvotes: 7

Related Questions