Reputation: 331
I am iterating through a List of Strings with +- 1500 entries. In each iteration I am again iterating through a List of Strings, but this time with +- 35 million entries. The result of the application is perfect. But it takes the application a long time (2+ hours) to give me the result. How should I structure multithreading to make my application faster?
The order of the result List is not important.
What are my other options?
Representation of the code:
List<String> result = new ArrayList<String>();
for(Iterator<String> i = data1.iterator();i.hasNext();){ //1500 entries
String val = i.next();
for(Iterator<String> j = data2.iterator();j.hasNext();){ //35 million entries
String test = j.next();
if(val.equals(test)){
result.add(val);
break;
}
}
}
for(Iterator<String> h = result.iterator();h.hasNext();){
//write to file
}
UPDATE
After restructuring my code and implementing the answer given by JB Nizet my application now runs a lot faster. It now only takes 20 seconds to get to the same result! Without multi-threading!
Upvotes: 1
Views: 1585
Reputation: 5591
I am also agree with your idea. What you need to do now?
- First calculate number of processor in your system.
- Based on number of processor split your records and create exactly that number of threads. ( numberofprocessor * 2 max, else because of context switching between thread performance will be degraded ).
Do not create unnecessarily lots of threads. That will not going to speedup your application. Check exactly how many threads you should create based on number of processor and size of memory in a system. Efficient parallel processing is depends on your machine hardware as well.
Upvotes: 2
Reputation: 691913
You could use a parallel stream:
List<String> result =
data1.parallelStream()
.filter(data2::contains)
.collect(Collectors.toList());
But since you call contains()
on data2
1500 times, and since contains()
is O(N) for a list, transforming it to a HashSet first could make things much faster: contains()
on HashSet is O(1). You might not even need multi-threading anymore:
Set<String> data2Set = new HashSet<>(data2);
List<String> result =
data.stream()
.filter(data2Set::contains)
.collect(Collectors.toList());
Upvotes: 3