Pedantic
Pedantic

Reputation: 1378

I want to read a big text file

I want to read a big text file, what i decided to create four threads and read 25% of file by each one. and then join them.

but its not more impressive. can any one tell me can i use concurrent programming for the same. as my file structure have some data as name contact compnay policyname policynumber uniqueno

and I want to put all data in hashmap at last.

thanks

Upvotes: 4

Views: 2043

Answers (5)

Olivier Croisier
Olivier Croisier

Reputation: 6149

You might want to use Memory-mapped file buffers (NIO) instead of plain java.io.

Upvotes: 1

Sanjeev
Sanjeev

Reputation: 1097

well you can take help from below link

http://java.sun.com/developer/technicalArticles/Programming/PerfTuning/

OR

by using large buffer

or using this

import java.io.*;

public class line1 {

public static void main(String args[]) {
  if (args.length != 1) {
    System.err.println("missing filename");
    System.exit(1);
  }
  try {
    FileInputStream fis =
        new FileInputStream(args[0]);
    BufferedInputStream bis =
        new BufferedInputStream(fis);
    DataInputStream dis =
        new DataInputStream(bis);
    int cnt = 0;
    while (dis.readLine() != null)
      cnt++;
    dis.close();
    System.out.println(cnt);
  }
  catch (IOException e) {
    System.err.println(e);
  }
}

}

Upvotes: 0

OregonGhost
OregonGhost

Reputation: 23759

Reading a large file is typically limited by I/O performance, not by CPU time. You can't speed up the reading by dividing into multiple threads (it will rather decrease performance, since it's still the same file, on the same drive). You can use concurrent programming to process the data, but that can only improve performance after reading the file.

You may, however, have some luck by dedicating one single thread to reading the file, and delegate the actual processing from this thread to worker threads, whenever a data unit has been read.

Upvotes: 9

Peter Tillemans
Peter Tillemans

Reputation: 35331

If it is a big file chances are that it is written to disk as a contiguous part and "streaming" the data would be faster than parallel reads as this would start moving the heads back and forth. To know what is fastest you need intimate knowledge of your target production environment, because on high end storage the data will likely be distributed over multiple disks and parallel reads might be faster.

Best approach is i think is to read it with large chunks into memory. Making it available as a ByteArrayInputStream to do the parsing.

Quite likely you will peg the CPU during parsing and handling of the data. Maybe parallel map-reduce could help here spread the load over all cores.

Upvotes: 1

aioobe
aioobe

Reputation: 420921

Well, you might flush the disk cache and put a high contention on the synchronization of the hashmap if you do it like that. I would suggest that you simply make sure that you have buffered the stream properly (possibly with a large buffer size). Use the BufferedReader(Reader in, int sz) constructor to specify buffer size.

If the bottle neck is not parsing the lines (that is, the bottle neck is not the CPU usage) you should not parallelize the task in the way described.

You could also look into memory mapped files (available through the nio package), but thats probably only useful if you want to read and write files efficiently. A tutorial with source code is available here: http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ314_029.htm

Upvotes: 0

Related Questions