Graph database vs. RDB with link/bridge tables

Question

I work in the fraud/AML (anti-money laundering) field, and we are exploring using a graph database to unearth hidden connections and links. I've read a fair amount abut graph databases lately (mostly neo4j, but I think the concepts are similar across different products?), and from what I can tell, they seem to be well-suited to this domain. The issue is that I'm having a hard time getting buy-in from tech management, as they seem to think that we can do the same things with our existing data reporting model, which is in Hadoop, and is essentially a data warehouse which has specific tables that provide many-to-many link tables between the core tables (I believe Kimball calls them 'bridge' tables?).

In a way, they seem to provide the same functionality as the relationship tables in a graph DB. Given that we have already constructed the link tablesin Hadoop, would a graph database provide any performance advantage for the kinds of things we may want to do (e.g. How is Customer A connected to Customer B), or have we largely negated any performance advantage of a graph DB by building all of the link tables?

djhallx · Accepted Answer

On similar hardware platforms, a relational database will never be able to keep up with a well constructed graph database when performing "path-between" queries. Never.

Every graph database product has its own internal storage representation, but they are all fundamentally designed to store nodes and edges and support navigational queries across those nodes and edges. Without the addition of new graph-support features, relational database will struggle to provide graph-like capabilities.

The other advantage of using a native graph database is that the graph query languages are specifically designed to support path-between queries. In Objectivity/DB, a massively scalable and distributable object/graph database, we can use the DO query language to find all of the paths between two entities up to a specified number of degrees apart in milliseconds or seconds. A DO query might look like the following:

Match p = (:Account { accountId = "1234"})
          -[*..100]->
          (:Account { accountId = "5678"})
          return p;

Here, we are saying: Find all paths (p) from Account 1234 to Account 5678, where they are between 1 and 100 degrees apart.

To create and execute this same query in a relational database would be much more complicated (without the addition of graph features to the database) and the execution of a query like this in a relational database would be much more resource intensive (memory, cpu, I/O).

If you have the opportunity to explore graph database for your project, make sure you understand your scalability and data distribution requirements. That information will be key to selecting the correct product.

Disclaimer: I am the Director of Field Operations for Objectivity.

Graph database vs. RDB with link/bridge tables

Answers (1)

Related Questions