Reputation: 33
I want to do a task that I've already completed except this time using multithreading. I have to read a lot of data from a file (line by line), grab some information from each line, and then add it to a Map. The file is over a million lines long so I thought it may benefit from multithreading.
I'm not sure about my approach here since I have never used multithreading in Java before. I want to have the main method do the reading, and then giving the line that has been read to another thread which will format a String, and then give it to another thread to put into a map.
public static void main(String[] args)
{
//Some information read from file
BufferedReader br = null;
String line = '';
try {
br = new BufferedReader(new FileReader("somefile.txt"));
while((line = br.readLine()) != null) {
// Pass line to another task
}
// Here I want to get a total from B, but I'm not sure how to go about doing that
}
public class Parser extends Thread
{
private Mapper m1;
// Some reference to B
public Parse (Mapper m) {
m1 = m;
}
public parse (String s, int i) {
// Do some work on S
key = DoSomethingWithString(s);
m1.add(key, i);
}
}
public class Mapper extends Thread
{
private SortedMap<String, Integer> sm;
private String key;
private int value;
boolean hasNewItem;
public Mapper() {
sm = new TreeMap<String, Integer>;
hasNewItem = false;
}
public void add(String s, int i) {
hasNewItem = true;
key = s;
value = i;
}
public void run() {
while (!Thread.currentThread().isInterrupted()) {
try {
if (hasNewItem) {
// Find if street name exists in map
sm.put(key, value);
newEntry = false;
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
// I'm not sure how to give the Map back to main.
}
}
I'm not sure if I am taking the right approach. I also do not know how to terminate the Mapper thread and retrieve the map in the main. I will have multiple Mapper threads but I have only instantiated one in the code above.
I also just realized that my Parse class is not a thread, but only another class if it does not override the run() method so I am thinking that the Parse class should be some sort of queue.
And ideas? Thanks.
EDIT: Thanks for all of the replies. It seems that since I/O will be the major bottleneck there would be little efficiency benefit from parallelizing this. However, for demonstration purpose, am I going on the right track? I'm still a bit bothered by not knowing how to use multithreading.
Upvotes: 0
Views: 390
Reputation: 86479
Frequently I/O will take much longer than the in-memory tasks. We refer to such work as I/O-bound. Parallelism may have a marginal improvement at best, and can actually make things worse.
You certainly don't need a different thread to put something into a map. Unless your parsing is unusually expensive, you don't need a different thread for it either.
If you had other threads for these tasks, they might spend most of their time sitting around waiting for the next line to be read.
Even parallelizing the I/O won't necessarily help, and may hurt. Even if your CPUs support parallel threads, your hard drive might not support parallel reads.
EDIT:
All of us who commented on this assumed the task was probably I/O-bound -- because that's frequently true. However, from the comments below, this case turned out to be an exception. A better answer would have included the fourth comment below:
Measure the time it takes to read all the lines in the file without processing them. Compare to the time it takes to both read and process them. That will give you a loose upper bound on how much time you could save. This may be decreased by a new cost for thread synchronization.
Upvotes: 2
Reputation: 2103
You may wish to read Amdahl's Law. Since the majority of your work is strictly serial (the IO) you will get negligible improvements by multi-threading the remainder. Certainly not worth the cost of creating watertight multi-threaded code.
Perhaps you should look for a new toy-example to parallelise.
Upvotes: 0
Reputation:
Why do you need multiple threads? You only have one disk and it can only go so fast. Multithreading it won't help in this case, almost certainly. And if it does, it will be very minimal from a user's perspective. Multithreading isn't your problem. Reading from a huge file is your bottle neck.
Upvotes: 6