Reputation: 1869
Regarding the insertion of vertices/ edges to Titan graph db 1.0; I did the batch insertion. It means the required sub-graph will be added to transaction and it will be committed after the insertion of whole sub-graph in the same transaction. My problem is that Titan shows strange performance dropping on inserting vertices/ edges to the same transaction over and over (before committing the transaction) . At the first the overall throughput is 400 edges/vertices per second but it will be dropped to less than 1 edges/ vertices per minutes depending on the size of sub-graph! (Please note that the performance drop is on adding/updating edges/vertices through transaction and storage backends are not get involved yet.)
I did change the transaction cache and db-cache ,and performance dropping is still exist in all different scenarios. In my test scenarios the performance drop is stopped only by committing the transaction frequently which will resulted to some inconsistency in multi thread situation, and is not acceptable for my application. I would be really grateful if someone can help me to overcome this situation.
The results of different scenarios so far:
I have some minor GC activity in different scenarios, from hundred of vertices/ edges to million of vertices/ edges.
Giving more heap space did not change the GC activity and throughput dropping. Even 4g of heap space for 200k of vertices/ edges did not change the throughout dropping; however, from 4g of heap only 1g is used for 200k scenario.
Changing transaction cache did not affect throughput dropping.
Increasing db-cache improve the insertion time at first place, but throughout dropped again.
Changing ids.flush
to false
did not affect the throughput dropping.
Some information about the whole operations:
Before adding edges/ vertices to transaction, it will be checked that there is no edge/ vertex with the same identifier already in database, if edge/ vertex is totally new, it will be added to transaction along with all properties. I did some investigation and it seems the time of getting part of my operation (check for existence of edge/vertex) increases, but the inserting time is almost constant.
My Titan configuration:
storage.backend=cassandrathrift
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
index.search.backend=solr
index.search.solr.mode=http
index.search.solr.http-urls=http://localhost:8983/solr/
schema.default=none
index.search.solr.wait-searcher=false
query.force-index=true
query.batch=true
query.fast-property=true
cache.tx-cache-size=4000000
Some new facts about my problem:
**It seems that the vertex/edge get part of my code (when I want to make sure that this is a new edge/vertex or not), and the property part of the code (when I want to add some property to vertex) are the most time consuming parts and almost all throughput dropping is resulted by these parts. Moreover, the whole throughput dropping will be solved if I try to commit the transaction frequently which is not possible in my case. **
Upvotes: 1
Views: 510
Reputation: 1455
What you describe might be caused by a mechanism in Titan that is responsible for ID Blocks allocation
.
Each Titan element - edge
, vertex
or a property
is given a unique ID by Titan, in a process called ID allocation - basically intended to assign a unique ID in a way that no other element will be given that ID, although other instances of Titan might be concurrently running.
This is an expensive process, involving multiple requests to your backend
store (HBase, for example).
By default, Titan assigns IDs to elements the moment you create them. This might become a severe performance bottleneck.
Titan allocates a block
of IDs each time - means that while Titan has more available IDs locally, it can allocate element IDs, and elements will be quickly created, and when it runs out of available IDs, it allocates more by contacting the backend again. That might explain the sudden performance drop you experience.
You can make Titan allocate IDs only when you commit a transaction
, by adding the following setting to your local Titan configuration file:
ids.flush=false
As mentioned on Titan's source:
When true, vertices and edges are assigned IDs immediately upon creation. When false, IDs are assigned only when the transaction commits.
Upvotes: 1