Reputation: 29727
I have quite a big network in a CSV file. It containt 450k nodes and 45 000 000 relationships. As I've read in neo4j documentation this type of database can handle such a big network.
I've also read that I can use embeded server as well as stand alone one.
My question is what is the difference between them? I would like to have a server which holds its database state.
Second question is that I can use REST API to perform operations on database, an Java API to do that.
What is the difference in performance ? I would like for example to have as an output all nodes levels.
Is it possible to load graph from CSV?
What is the best solution for my problem?
Upvotes: 3
Views: 3515
Reputation: 41676
Here is the code you would use with the Neo4j-Batch-Inserter to import the call-records, instead of generating the data on the fly you would of course read it from a file and split each record accordingly.
import org.apache.commons.io.FileUtils;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.index.BatchInserterIndex;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.index.impl.lucene.LuceneBatchInserterIndexProvider;
import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl;
import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import static org.neo4j.helpers.collection.MapUtil.map;
public class CallRecordImportBatch {
public static final int MILLION = 1000000;
public static final int BATCH_SIZE = MILLION;
public static final int CALLS = 45 * MILLION;
public static final int USERS = CALLS / 100;
public static final File STORE_DIR = new File("target/calls_"+ CALLS);
private static final Random rnd = new Random();
enum MyRelationshipTypes implements RelationshipType {CALLED}
private static String randomPhoneNumber() {
final int phoneNumber = rnd.nextInt(USERS);
return String.format("%013d", phoneNumber);
}
public static void main(String[] args) throws IOException {
long time = System.currentTimeMillis();
CallRecordImportBatch importBatch = new CallRecordImportBatch();
importBatch.createGraphDatabase();
System.out.println((System.currentTimeMillis() - time) + " ms: "+ "Create Database");
}
private BatchInserterImpl db;
private BatchInserterIndex phoneNumberIndex;
private void createGraphDatabase() throws IOException {
if (STORE_DIR.exists()) FileUtils.cleanDirectory(STORE_DIR);
STORE_DIR.mkdirs();
db = new BatchInserterImpl(STORE_DIR.getAbsolutePath(),
MapUtil.stringMap("cache_type", "weak",
"neostore.nodestore.db.mapped_memory", "500M",
"neostore.relationshipstore.db.mapped_memory", "2000M",
"neostore.propertystore.db.mapped_memory", "1000M",
"neostore.propertystore.db.strings.mapped_memory", "0M",
"neostore.propertystore.db.arrays.mapped_memory", "0M"
));
final LuceneBatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider(db);
phoneNumberIndex = indexProvider.nodeIndex("Caller", MapUtil.stringMap("type", "exact"));
phoneNumberIndex.setCacheCapacity("Caller", 1000000);
long time = System.currentTimeMillis();
Map<String,Long> cache = new HashMap<String,Long>(USERS);
try {
for (int call=0;call< CALLS;call++) {
if (call % BATCH_SIZE == 0) {
System.out.println((System.currentTimeMillis() - time) + " ms: "+ String.format("calls %d callers %d", call, cache.size()));
time = System.currentTimeMillis();
}
final String callerNumber = randomPhoneNumber();
final int duration = (int) (System.currentTimeMillis() % 3600);
final String calleeNumber = randomPhoneNumber();
long caller = getOrCreateCaller(cache, callerNumber);
long callee = getOrCreateCaller(cache, calleeNumber);
db.createRelationship(caller, callee, MyRelationshipTypes.CALLED, map("duration", duration));
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println((System.currentTimeMillis() - time) + " ms: " + String.format("calls %d callers %d", CALLS, cache.size()));
indexProvider.shutdown();
db.shutdown();
}
private Long getOrCreateCaller(Map<String, Long> cache, String number) {
final Long callerId = cache.get(number);
if (callerId!=null) return callerId;
long caller = createCaller(number);
cache.put(number, caller);
return caller;
}
private long createCaller(String number) {
long caller = db.createNode(map("Number", number));
phoneNumberIndex.add(caller, map("Number", number));
phoneNumberIndex.flush();
return caller;
}
}
Upvotes: 5
Reputation: 41676
There is an java-API to perform REST operations.
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-rest-graphdb</artifactId>
<version>1.5-SNAPSHOT</version>
or last milestone
<version>1.5.M02.U1</version>
</dependency>
What do you mean by:
I would like for example to have as an output all nodes levels.
Regarding your other questions - what does your data model look like?
Upvotes: 1
Reputation: 4361
The embedded database sits in the same process as your application, meaning that there's no network overhead (so embedded is much faster). Of course both keep the data, that's why you have a database to begin with :-)
You can even use the embedded mode and standalone server at the same time, see: http://docs.neo4j.org/chunked/snapshot/server-embedded.html
For loading lots of data in one go, the BatchInserter should be used, see: http://docs.neo4j.org/chunked/milestone/indexing-batchinsert.html
Upvotes: 2