Héctor
Héctor

Reputation: 26044

Persistent storage in JanusGraph using Cassandra

I'm playing with JanusGraph and Cassandra backend but I have some doubts.

I have a Cassandra server running on my machine (using Docker) and in my API I have this code:

 GraphTraversalSource g = JanusGraphFactory.build()
        .set("storage.backend", "cql")
        .set("storage.hostname", "localhost")
        .open()
        .traversal();

Then, through my API, I'm saving and fetching data using Gremlin. It works fine, and I see data saved in Cassandra database.

The problem comes when I restart my API and try to fetch data. Data is still stored in Cassandra but JanusGraph query returns empty. Why?

Do I need to load backend storage data into memory or something like that? I'm trying to understand how it works.

EDIT

This is how I add an item:

 Vertex vertex = g.addV("User")
          .property("username", username)
          .property("email", email)
          .next();

And to fetch all:

List<Vertex> all = g.V().toList()

Upvotes: 3

Views: 1128

Answers (1)

Florian Hockmann
Florian Hockmann

Reputation: 2809

Commit your Transactions

You are using JanusGraph right now embedded as a library in your application which gives you access to the full API of JanusGraph. This means that you have to manage transactions on your own which also includes the necessity to commit your transactions in order to persist your modifications to the graph.

You can simply do this by calling:

g.tx().commit();

after you have iterated your traversal with the modifications (the addV() traversal in your case).

Without the commit, the changes are only available locally in your transaction. When you restart your Docker container(s), all data will be lost as you haven't committed it.

The Recommended Approach: Connecting via Remote

If you don't have a good reason to embed JanusGraph as a library in your JVM application, then it's recommended to deploy it independently as JanusGraph Server to which you can send your traversals for execution. This has the benefit that you can scale JanusGraph independently of your application and also that you can use it from non-JVM languages.

JanusGraph Server then also manages transactions for you transparently by executing each traversal in its own transaction. If the traversal succeeds, then the results are committed and they are also rolled back automatically if an exception occurs.

The JanusGraph docs contain a section about how to connect to JanusGraph Server from Java but the important part is this code to create a graph traversal source g connected to your JanusGraph Server(s):

Graph graph = EmptyGraph.instance();
GraphTraversalSource g = graph.traversal().withRemote("conf/remote-graph.properties");

You can start JanusGraph Server of course also as a Docker container:

docker run --rm janusgraph/janusgraph:latest

More information about the JanusGraph Docker image and how it can be configured to connect to your Cassandra backend can be found here.


The part below is not directly relevant for this question any more given the comments to my first version of the answer. I am still leaving it here in case that others have a similar problem where this could actually be the cause.

Persistent Storage with Docker Containers

JanusGraph stores the data in your storage backend which is Cassandra in your case. That means that you have to ensure that Cassandra persists the data. If you start Cassandra in a Docker container, then you have to mount a volume where Cassandra stores the data to persist it beyond restarts of the container. Otherwise, the data will be lost once you stop the Cassandra container.

To do this, you can start the Cassandra container for example like this:

docker run -v /my/own/datadir:/var/lib/cassandra -d cassandra

where /my/own/datadir is the directory of your host system where you want the Cassandra data to be stored. This is explained in the docs of the official Cassandra Docker image under Caveats > Where to Store Data.

Upvotes: 8

Related Questions