imriqwe
imriqwe

Reputation: 1455

Migrating existing Titan graph data when scheme changes

I have a 25TB Titan graph DB, hosted on an HBase table.

The graph holds data of my users, such as interests, friendships etc. I also keep all of the data on an SQL relational DB.

I am working on a new feature that requires me to change the scheme of the User vertex, splitting it to multiple smaller vertices and edges.

How should I handle such a case? What is the best practice for such a huge change on Titan data? Should I think about re-building the graph from the SQL data, or should I migrate the existing data? (billions of vertices and edges?

Upvotes: 0

Views: 127

Answers (1)

Sebastian Good
Sebastian Good

Reputation: 6360

By and large the approach for these sorts of very large schema changes is independent of database technology. Unless you can afford to take the whole system offline while you make the change, you'll need to migrate the data over time, which means you'll have two versions of the data around at the same time. Without looking at the details of your suggested change, it's hard to say what your best strategy is.

If I assume your plan is "just" to take each user vertex and split it into several smaller interconnected vertices, I'll assume in both cases you still have a canonical user vertex you can find in a search, e.g. user 5 will be represented by either one "big vertex" or one "small vertex connected to other vertices".

Create a process which creates the "small vertex" copy of each "big vertex", but keep the "big vertex" around, too. This will take time to run, but it will eventually finish. Edits to vertices will have to update both "big" and "small". Do your searches on just the "big" ones, since they'll still be around.

After some time, you will have a "small" vertex for every "big" vertex. Then you can deploy code which only does searches for the "small" vertices. After that is proven successful, you can retire the code which simultaneously edits both, and then of course run another job which deletes all the "big" vertices.

It's a pain, but when you have a reasonable amount of data in a live system, it's the only approach you can take.

Upvotes: 3

Related Questions