Reputation: 1
I had a file named sample.txt
2GB (example). I want to split the file into four parts and each should be read simultaneously and write on the other file Sample1.txt
simultaneously.
Please help me.
Upvotes: 0
Views: 299
Reputation: 23950
I assume you know it's not possible to insert extra data in the middle of a file. So you'd need to know in advance how large Sample1.txt will be (in bytes) and what position each of the 4 blocks will start at. You would then create the file of the correct size.
You could then use a RandomAccessFile for each of the writers, each initialized with a seek() to the position (in bytes) where that block will start. The same with the reading - you seek the position on which you start.
Note that this is not line oriented, but byte oriented. You almost have to assume fixed size lines in the input, and certainly in the output.
Also note that having multiple processes reading and writing to the same file only increases speed if processing overhead is significant. Otherwise you'll just lose speed due to the fact that the harddisk head has to move to a new position all the time.
I would probably use a single reader thread, a single writer thread, and multiple processing threads using the producer-consumer pattern.
The reader would read each line and write it to a BlockingQueue. The processors take() from that queue, and write to a single other BlockingQueue. The writer thread would take() from that second queue and write to disk. (The order of input and output could/would be lost though).
The BlockingQueue javadoc also describes the producer-consumer pattern.
That way your slow IO is single threaded (or actually dual threaded) and the fast CPU is doing lots of processing in multiple threads.
If you don't need a lot of processing per line, forget about multiple threads. Your speed then is limited by IO and that will only get slower the more threads you use.
Upvotes: 2
Reputation: 11838
You need to create four threads. Each thread opens the file at its own position, you can calculate the position for each thread and pass it in the contructor, you may also want to pass the size of data. Then each thread in loop reads data from file in a buffer and writes the data into another file.
Upvotes: 0
Reputation: 178451
you might want to have a look at Apache Hadoop. the framework implements mapreduce, which seems to be exactly what you need
Upvotes: 2