Reputation: 6039
I tested mapdb with integer keys and string values to insert 10,000,000 elements inside. Here is what I see:
Processed 1.0E-5 percent of the data / time so far = 0 seconds
Processed 1.00001 percent of the data / time so far = 7 seconds
Processed 2.00001 percent of the data / time so far = 14 seconds
Processed 3.00001 percent of the data / time so far = 20 seconds
Processed 4.00001 percent of the data / time so far = 26 seconds
Processed 5.00001 percent of the data / time so far = 33 seconds
Processed 6.00001 percent of the data / time so far = 39 seconds
Processed 7.00001 percent of the data / time so far = 45 seconds
Processed 8.00001 percent of the data / time so far = 53 seconds
Processed 9.00001 percent of the data / time so far = 60 seconds
Processed 10.00001 percent of the data / time so far = 66 seconds
Processed 11.00001 percent of the data / time so far = 73 seconds
Processed 12.00001 percent of the data / time so far = 80 seconds
Processed 13.00001 percent of the data / time so far = 88 seconds
Processed 14.00001 percent of the data / time so far = 96 seconds
Processed 15.00001 percent of the data / time so far = 102 seconds
Processed 16.00001 percent of the data / time so far = 110 seconds
Processed 17.00001 percent of the data / time so far = 119 seconds
Processed 18.00001 percent of the data / time so far = 127 seconds
Processed 19.00001 percent of the data / time so far = 134 seconds
Processed 20.00001 percent of the data / time so far = 141 seconds
Processed 21.00001 percent of the data / time so far = 149 seconds
Processed 22.00001 percent of the data / time so far = 157 seconds
Processed 23.00001 percent of the data / time so far = 164 seconds
Processed 24.00001 percent of the data / time so far = 171 seconds
Processed 25.00001 percent of the data / time so far = 178 seconds
....
About 2.5 million instances are put in the map within 178 seconds. For 10 millions it is around 12 mins.
Then I switched into more complicated values and the speed highly dropped (It took 3-4 days to add the whole 10,000,000 instances into the map). Anyone has any suggestions to speed up mapdb insertions? Any prior speed related experience/problem with MabDB?
There is also an evaluation here: http://kotek.net/blog/3G_map
Update: I used the common procedure for creating the map. Here is a pseudocode:
DB db = DBMaker.newFileDB()....;
... map = db.getHashMap(...);
loop (...) {
map.put(...);
}
db.commit();
Upvotes: 1
Views: 3406
Reputation: 1084
MapDB author here.
For start use specialized serializers they are bit faster:
Map m = dbmaker.createHashMap("a").keySerializer(Serializer.LONG).valueSerializer(Serializer.LONG).makeOrGet()
Next for import I would recommend to use Data Pump with TreeMap. An example is here: https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/Huge_Insert.java
Upvotes: 3
Reputation: 4574
From official site of mapdb I see follow:
Concurrent - MapDB has record level locking and state-of-art concurrent engine. Its performance scales nearly linearly with number of cores. Data can be written by multiple parallel threads.
I thought, that's it, and wrote simple test:
package com.stackoverflow.test;
import java.io.File;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ConcurrentNavigableMap;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import org.mapdb.*;
public class Test {
private static final int AMOUNT = 100000;
private static final class MapAddingThread implements Runnable {
private Integer fromElement;
private Integer toElement;
private Map<Integer, String> map;
private CountDownLatch countDownLatch;
public MapAddingThread(CountDownLatch countDownLatch, Map<Integer, String> map, Integer fromElement, Integer toElement) {
this.countDownLatch = countDownLatch;
this.map = map;
this.fromElement = fromElement;
this.toElement = toElement;
}
public void run() {
for (Integer i = this.fromElement; i < this.toElement; i++) {
map.put(i, i.toString());
}
this.countDownLatch.countDown();
}
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
// int cores = 1;
int cores = Runtime.getRuntime().availableProcessors();
CountDownLatch countDownLatch = new CountDownLatch(cores);
ExecutorService executorService = Executors.newFixedThreadPool(cores);
int part = AMOUNT / cores;
long startTime = new Date().getTime();
System.out.println("Starting test in " + cores + " threads");
DB db = DBMaker.newFileDB(new File("testdb5")).cacheDisable().closeOnJvmShutdown().make();
Map<Integer, String> map = db.getHashMap("collectionName5");
for (Integer i = 0; i < cores; i++) {
executorService.execute(new MapAddingThread(countDownLatch, map, i * part, (i + 1) * part));
}
countDownLatch.await();
long endTime = new Date().getTime();
System.out.println("Filling elements takes : " + (endTime - startTime));
db.commit();
System.out.println("Commit takes : " + (new Date().getTime() - endTime));
db.close();
}
}
And got results:
Starting test in 4 threads
Filling elements takes : 4424
Commit takes : 901
Then I run the same in one thread:
package com.stackoverflow.test;
import java.io.File;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ConcurrentNavigableMap;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import org.mapdb.*;
public class Test {
private static final int AMOUNT = 100000;
private static final class MapAddingThread implements Runnable {
private Integer fromElement;
private Integer toElement;
private Map<Integer, String> map;
private CountDownLatch countDownLatch;
public MapAddingThread(CountDownLatch countDownLatch, Map<Integer, String> map, Integer fromElement, Integer toElement) {
this.countDownLatch = countDownLatch;
this.map = map;
this.fromElement = fromElement;
this.toElement = toElement;
}
public void run() {
for (Integer i = this.fromElement; i < this.toElement; i++) {
map.put(i, i.toString());
}
this.countDownLatch.countDown();
}
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
int cores = 1;
// int cores = Runtime.getRuntime().availableProcessors();
CountDownLatch countDownLatch = new CountDownLatch(cores);
ExecutorService executorService = Executors.newFixedThreadPool(cores);
int part = AMOUNT / cores;
long startTime = new Date().getTime();
System.out.println("Starting test in " + cores + " threads");
DB db = DBMaker.newFileDB(new File("testdb5")).cacheDisable().closeOnJvmShutdown().make();
Map<Integer, String> map = db.getHashMap("collectionName5");
for (Integer i = 0; i < cores; i++) {
executorService.execute(new MapAddingThread(countDownLatch, map, i * part, (i + 1) * part));
}
countDownLatch.await();
long endTime = new Date().getTime();
System.out.println("Filling elements takes : " + (endTime - startTime));
db.commit();
System.out.println("Commit takes : " + (new Date().getTime() - endTime));
db.close();
}
}
And got results:
Starting test in 1 threads
Filling elements takes : 3639
Commit takes : 924
So, if I am doing everything correctly , then it seems like mapdb not scalable for number of cores.
Only things that you can play with only:
Api methods (e.g. encryption switching, caching, tree map/hash map usage)
Trying to change capacity of the map via Reflection
Upvotes: 0