The Stompiest
The Stompiest

Reputation: 331

How to use multi-threading to make my application faster

I am iterating through a List of Strings with +- 1500 entries. In each iteration I am again iterating through a List of Strings, but this time with +- 35 million entries. The result of the application is perfect. But it takes the application a long time (2+ hours) to give me the result. How should I structure multithreading to make my application faster?

The order of the result List is not important.

What are my other options?

Representation of the code:

List<String> result = new ArrayList<String>();
for(Iterator<String> i = data1.iterator();i.hasNext();){ //1500 entries
  String val = i.next();
  for(Iterator<String> j = data2.iterator();j.hasNext();){ //35 million entries
    String test = j.next();
    if(val.equals(test)){
      result.add(val);
      break;
    }
  }
}
for(Iterator<String> h = result.iterator();h.hasNext();){
  //write to file
}

UPDATE

After restructuring my code and implementing the answer given by JB Nizet my application now runs a lot faster. It now only takes 20 seconds to get to the same result! Without multi-threading!

Upvotes: 1

Views: 1585

Answers (2)

Abhijit Pritam Dutta
Abhijit Pritam Dutta

Reputation: 5591

I am also agree with your idea. What you need to do now?

  1. First calculate number of processor in your system.
  2. Based on number of processor split your records and create exactly that number of threads. ( numberofprocessor * 2 max, else because of context switching between thread performance will be degraded ).

Do not create unnecessarily lots of threads. That will not going to speedup your application. Check exactly how many threads you should create based on number of processor and size of memory in a system. Efficient parallel processing is depends on your machine hardware as well.

Upvotes: 2

JB Nizet
JB Nizet

Reputation: 691913

You could use a parallel stream:

List<String> result = 
    data1.parallelStream()
         .filter(data2::contains)
         .collect(Collectors.toList());

But since you call contains() on data2 1500 times, and since contains() is O(N) for a list, transforming it to a HashSet first could make things much faster: contains() on HashSet is O(1). You might not even need multi-threading anymore:

Set<String> data2Set = new HashSet<>(data2);
List<String> result = 
    data.stream()
        .filter(data2Set::contains)
        .collect(Collectors.toList());

Upvotes: 3

Related Questions