Reputation: 8906
I have two huge sorted arrays (~100K items each). I need to intersect them really fast. Now I am doing it the standard way:
But it takes too long (~ 350 microseconds) to complete, which results in quite poor overall performance. Is there a way to do it quicker?
P.S. Intersection size is not larger that 1000 items (on average) and I need only 25 to 100 of them.
Upvotes: 6
Views: 1083
Reputation: 2608
Below code runs within around 10 millis for me. So I guess you are either processing strings or on a scripting language.
package com.example.so.algorithms;
import java.util.Arrays;
import java.util.Random;
/**
* <p> http://stackoverflow.com/questions/42538902/how-to-intersect-two-sorted-arrays-the-fastest-possible-way#comment72213844_42538902 </p>
* <p> Given two sorted sub-lists of 100k each determine the first 10 intersecting (common) entries within 350 millis </p>
* @author Ravindra
* @since 03March2017
*
*/
public class TestMergeIntersection {
/**
* <pre>
Time (millis):9
Result :[442958664, 932132404, 988442487, 1356502780, 1614742980, 1923995812, 1985016181, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
</pre>
* @param args
*/
public static void main(String[] args) {
handleTest();
}
private static void handleTest() {
int size = 1024*128;
int intersectionCount = 100;
int[] arrayOne = generateSortedSublist(size);
int[] arrayTwo = generateSortedSublist(size);
int[] result = new int[intersectionCount];
int count = 0;
int i=0;
int j=0;
long start = System.currentTimeMillis();
while(count < 100 && i < size && j < size ) {
if( arrayOne[i] < arrayTwo[j]) {
i++;
} else if( arrayOne[i] > arrayTwo[j] ) {
j++;
} else {
result[count] =arrayOne[i];
i++;
j++;
count++;
}
}
long end = System.currentTimeMillis();
System.out.println("Time (millis):"+(end-start));
System.out.println("Result :"+Arrays.toString(result));
}
private static int[] generateSortedSublist(int size) {
Random random = new Random();
int[] result = new int[size];
for(int i=0;i<result.length;i++) {
result[i] = random.nextInt(Integer.MAX_VALUE);
}
Arrays.sort(result);
return result;
}
}
Upvotes: 0
Reputation: 1426
I am posting a naive implementation of the problem/solutions: 2 arrays filled with random ints. If the threshold of 100 intersected values is reached, the loops break.
One loops using the OP logic. The other launches two threads each one processing one half of the array.
It seems that the threading overhead can be an issue. Or It may need fine tuning.
It is a 20 runs sample. Worst case scenario: no intersection that forces the run to the end of arrays. Times are in microseconds.
Workers: 2806
Workers: 4197
Workers: 4235
Workers: 818
Workers: 729
Workers: 3376
Workers: 740
Workers: 688
Workers: 2245
Workers: 732
Workers: 330
Workers: 945
Workers: 605
Workers: 630
Workers: 630
Workers: 334
Workers: 643
Workers: 309
Workers: 290
Workers: 761
done
Sorted: 1525
Sorted: 405
Sorted: 550
Sorted: 880
Sorted: 265
Sorted: 267
Sorted: 252
Sorted: 310
Sorted: 253
Sorted: 272
Sorted: 285
Sorted: 270
Sorted: 270
Sorted: 315
Sorted: 267
Sorted: 269
Sorted: 265
Sorted: 258
Sorted: 269
Sorted: 289
done
package so;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.TimeUnit;
public final class CrazyClass {
static class Feeder implements Runnable{
final int b, e;
int[] k1001;
int[] k1002;
final Set<Integer> setThis;
Feeder(int[] ia, int[] ia1, int be, int en, Set<Integer> s){
k1001 = ia;
k1002= ia1;
b = be;
e = en;
setThis = s;
}
public void run() {
int i2 = b;
for(int i1 = b; i1 < e; i1++){
if (k1001[i1] == k1002[i2]){
synchronized(setThis){
setThis.add(k1001[i1]);
if (setThis.size() == 25){
System.out.println("bye!!!");
break;
}
}
}
else if (k1001[i1] < k1002[i2])
i1++;
else if (k1001[i1] > k1002[i2])
i2++;
}
}
}
static void sorted(){
int i1 = 0, i2 = 0;
Set<Integer> result = new HashSet<Integer>();
Random r = new Random();
int[] k1001 = new int[100000];
int[] k1002 = new int[100000];
for(int i = 0; i< k1001.length; i++){
k1001[i] = r.nextInt();
k1002[i] = r.nextInt();
}
Arrays.sort(k1001);
Arrays.sort(k1002);
long l = System.nanoTime();
for(; i1 < k1001.length; i1++){
if (k1001[i1] == k1002[i2]){
result.add(k1001[i1]);
if (result.size() == 100){
System.out.println("bye!!!");
break;
}
}
else if (k1001[i1] < k1002[i2])
i1++;
else if (k1001[i1] > k1002[i2])
i2++;
}
l = System.nanoTime() - l;
System.out.println("Sorted: " + TimeUnit.MICROSECONDS.convert(l, TimeUnit.NANOSECONDS));
}
static void workers(){
Thread t1, t2;
Set<Integer> setThis = new HashSet<Integer>();
Random r = new Random();
int[] k1001 = new int[100000];
int[] k1002 = new int[100000];
for(int i = 0; i< k1001.length; i++){
k1001[i] = r.nextInt();
k1002[i] = r.nextInt();
}
t1 = new Thread(new Feeder(k1001, k1002, 0, 49999, setThis));
t2 = new Thread(new Feeder(k1001, k1002, 50000, 99999, setThis));
try{
long l = System.nanoTime();
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println("Workers: " + TimeUnit.MICROSECONDS.convert(System.nanoTime() - l, TimeUnit.NANOSECONDS));
}catch(Exception x){
}
}
static public void main(String[] args){
int run = 20;
for(int i = 0; i < run; i++)
workers();
System.out.println("done");
for(int i = 0; i < run; i++)
sorted();
System.out.println("done");
}
}
Upvotes: 1
Reputation: 46409
Running through 2 100k arrays in parallel requires about 200k comparisons. You are currently completing it in 350 microseconds = 350k nanoseconds. So your per comparison time is just under 2 nanoseconds. If your CPU is around 4 GHz, then that's 8 clock cycles.
That's good. You could try to be sophisticated, detecting runs and so on, but probably you'll hurt yourself more with pipeline stalls than you'll save work.
There are only 2 ways to speed this up. Do less work, or add more workers.
You've indicating that doing less work is feasible, which is why Tamas Hegedus suggested that. Instead of creating the intersection, create an Iterator
that will return the next thing in the intersection. This will require you to rewrite the logic that uses said iterator, but you'll do under 10% of the current computation. Which is going to be close to 10x faster.
As for adding workers, you'll want to divide the work among worker threads and keep them from stepping all over each other. For k
small (no larger than your number of CPUs!), with a logarithmic amount of work in the size of your arrays, you can do a quickselect to find k-1
values that break the combined arrays into k
even chunks (oops Adapt http://www.geeksforgeeks.org/median-of-two-sorted-arrays/ instead of doing a quickselect...), and the indexes of those values in each array. This creates k
problems of even difficulty, each of which can be specified as 4 numbers. Spin up k
threads and let each get a chunk of the answer. This will be roughly k
times faster than what you are currently doing.
At the cost of a lot more effort, these approaches can be combined. What you do is have the iterator create, say, 4 workers and hand out blocks to each. When you call iter.next()
the iterator will hand you a next value that it has if it has one. If it doesn't have one it will wait for the worker that is producing its next block to complete, grab that block, hand that worker another block if one is ready, and then hand out the first value in that block. You can play with the block size. You want it large enough that the CPU does a good job of figuring out that it should be streaming from RAM to CPU caches, and doesn't think that there is synchronization contention between threads.
My guess given the size and synchronization constraints, the hybrid approach won't be much of a win, if any, over the iterator approach. But if you're really desperate, you can try it.
Upvotes: 2