Reputation: 3548
While exploring the JanusGraph-core library, I seen the id generation part (StandardIDPool.nextID()), which seems to be id for a janus vertex is getting generated by the application logic. In this case, how can I horizontally scale an application that is using janusGraph, can't I get id conflicting problem while scaling the application?
What is the best approach to scale the app that use JanusGraph?
Upvotes: 2
Views: 218
Reputation: 1581
The JanusGraph instances for a graph select one instance that maintains an ID pool manager. The JanusGraph reference documentation says the following about optimizing ID allocation:
ID Block Size
Each newly added vertex or edge is assigned a unique id. JanusGraph’s id pool manager acquires ids in blocks for a particular JanusGraph instance. The id block acquisition process is expensive because it needs to guarantee globally unique assignment of blocks. Increasing ids.block-size reduces the number of acquisitions but potentially leaves many ids unassigned and hence wasted. For transactional workloads the default block size is reasonable, but during bulk loading vertices and edges are added much more frequently and in rapid succession. Hence, it is generally advisable to increase the block size by a factor of 10 or more depending on the number of vertices to be added per machine.
Rule of thumb: Set ids.block-size to the number of vertices you expect to add per JanusGraph instance per hour.
Important: All JanusGraph instances MUST be configured with the same value for ids.block-size to ensure proper id allocation. Hence, be careful to shut down all JanusGraph instances prior to changing this value.
ID Acquisition Process
When id blocks are frequently allocated by many JanusGraph instances in parallel, allocation conflicts between instances will inevitably arise and slow down the allocation process. In addition, the increased write load due to bulk loading may further slow down the process to the point where JanusGraph considers it failed and throws an exception. There are three configuration options that can be tuned to avoid this.
Rule of thumb: Set this to the sum of the 95th percentile read and write times measured on the storage backend cluster under load. Important: This value should be the same across all JanusGraph instances.
Rule of thumb: Set this value to be as large feasible to not have to wait too long for unrecoverable failures. The only downside of increasing it is that JanusGraph will try for a long time on an unavailable storage backend cluster.
Upvotes: 6