ifxdev
ifxdev

Reputation: 23

Azure CosmosDB Graph traversal performance issue

Imagine simple transactions on a bank account as a graph database. With events like this:

  1. +$1000 cash input.
  2. -$10 for buying a book by VISA.
  3. +$100 cash for selling an old bike.
  4. -$50 cash for buying groceries.

In a graph structure we could define the nodes as the transactions with the properties:

The edges would then be pointing to the previous transaction. We could have other edges pointing to other accounts (for transfers between accounts), owners etc. but for simplicity we have this structure.

    g.addV('transactions').property('id','1').property('time',0).property('delta',1000).property('description','cash input')
g.addV('transactions').property('id','2').property('time',1).property('delta,-10).property('description','for buying a book by VISA')
g.V('2').addE('previous').to(g.V('1'))
g.addV('transactions').property('id','3').property('time',2).property('delta',100).property('description','cash for selling an old bike.')
g.V('3').addE('previous').to(g.V('2'))
g.addV('transactions').property('id','4').property('time',3).property('delta',-50).property('description','cash for buying groceries')
g.V('4').addE('previous').to(g.V('3'))

Now to get the current inventory of this account we would just traverse the previous edge from the latest transaction, until specific date, or to the beginning, like this:

g.V('4').emit().repeat(out('previous')).until(has('time',0)).properties('delta').value().sum()

==>1040

Which is all good and fast for 4 transactions. BUT, when doing this for 100 000 transactions it takes around 8 minutes, and with more complex operations or more data it takes even longer.

In my test case i have set up an Azure Cosmos-DB with the Graph API and a throughput of 2000 RU/s.

Since I am fairly new to Graph databases and queries, I realize that there might be faster and better ways of doing this, and ways to optimize this, that I don't know about. Maybe even graph database is not the right tool for this job?

What I want to achieve here is a reasonable fast query into the transactions, that may fork into multiple accounts, and many more events.

How can I make this work better?

Upvotes: 1

Views: 696

Answers (1)

Daniel Kuppitz
Daniel Kuppitz

Reputation: 10904

Why don't you just add a current property for each transaction vertex? This would still keep the history as you have it now, but also provide much faster access to the current inventory value (at any given point in time). Also, should the value of any transaction change afterwards, it would be easy to update all newer transactions accordingly (but this would only be a long running write query, reads would would still be blazing fast).

Keep in mind, that it's generally a bad idea to have so many hops in an OLTP query.

Upvotes: 3

Related Questions