Reputation: 87
I am evaluating an SDK and I need to cross compare ~15000 iris images stored in a gallery folder and generate the similarity scores as a 15000 x 15000 matrix.
So I pre-processed all the images and stored the processed blobs in an ArrayList. Then I'm using multiple threads with 2 'for' loops in the run method to call the 'compare' method (from SDK) and pass the index of an ArrayList as parameters to compare those respective blobs and save the integer return values in an excel sheet using Apache poi library. The performance is very inefficient (each comparison takes ~40ms) and the whole task takes a lot of time (~100 days estimated with 8 cores running at 100%) to do all the 225,000,000 comparisons. Please help me understand this bottle neck.
Multithreading code
int processors = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(processors);
for(int i =0; i<processors; i++) {
//each thread compares 1875 images with 15000 images
Runnable task = new Thread(bloblist,i*1875,i*1875+1874);
executor.execute(task);
}
executor.shutdown();
Run Method
public void run(){
for(int i = startIndex; i<= lastIndex; i++) {
for(int j=0;j<15000;j++){
compare.compareIris(bloblist.get(i),bloblist.get(j));
score= compare.getScore();
//save result to Excel using Apache POI
...
...
}
}
}
Please suggest me a time-efficient architecture to accomplish this task. Shall I store the blobs in a NoSQL DB or is there any alternate way to do this?
Upvotes: 4
Views: 1325
Reputation: 5225
I'd consider adding some simple profiling to your code as a first step. Profiling libraries are great, but can be a little intimidating. All you really need to get started is:
public void run(){
long sumCompare = 0;
long sumSave = 0
for(int i = startIndex; i<= lastIndex; i++) {
for(int j=0;j<15000;j++){
final long compareStart = System.currentTimeMillis();
compare.compareIris(bloblist.get(i),bloblist.get(j));
score= compare.getScore();
final long compareEnd = System.currentTimeMillis();
compareSum += (compareEnd - compareStart);
//save result to Excel using Apache POI
...
...
final long saveEnd = System.currentTimeMillis();
saveSum += (saveEnd - compareEnd);
}
}
System.out.println(String.format("Compare: %d; Save: %d", sumCompare, sumSave);
}
Maybe run this over a 100x100 grid instead to get a rough idea of where the bulk of your runtime is.
If it's the save step, I'd strongly recommend using a database as an intermediate step between computing the score and exporting it to a spreadsheet. A NoSQL database would work, although I'd also encourage you to look at something like SQLite, just for simplicity's sake. (Many NoSQL databases are designed to offer advantages across a cluster of database nodes while working with very large datasets; if you're storing write-only data on one node, SQL may be your best bet.)
If the bottleneck is the compute step, it will be more difficult to improve performance. If the blobs don't all fit comfortably in RAM along with whatever RAM the comparisons consume, you may be paying the price of swapping this data onto and off-of disk. You may see a small improvement by having each thread take work "off of a queue", rather than starting with a pre-assigned block:
final int processors = Runtime.getRuntime().availableProcessors();
final ExecutorService executor = Executors.newFixedThreadPool(processors);
final AtomicLong nextCompare = new AtomicLong(0);
for(int i =0; i<processors; i++) {
Runnable task = new Thread(bloblist, nextCompare);
executor.execute(task);
}
executor.shutdown();
public void run(){
while (true) {
final long taskNum = nextCompare.getAndIncrement();
if (taskNum >= 15000 * 15000) {
return;
}
final long i = Math.floor(taskNum/15000);
final long j = taskNum % 15000;
compare.compareIris(bloblist.get(i),bloblist.get(j));
score = compare.getScore();
// Save score, etc.)
}
}
This will result in all threads working on blobs stored relatively close together in memory. In this way, no thread is evicting data from the cache that another thread will require in the near future. You are, however, paying the price of locking the AtomicLong
; if memory thrashing wasn't your issue, this will likely be a bit slower.
Upvotes: 1