Pallav Raj
Pallav Raj

Reputation: 41

JanusGraph Data Ingestion atscale

We are ingesting data in Janusgraph on Kubernetes(GCP) using python. We are doing multithreading, node chaining, indexing but still we are able to ingest only 100k (Nodes) records in 60 minutes from bigquery.

Kubernetes spec - 1 Pod with 25 vCPUs and 150 GiB RAM ids.block-size = 10 million thread pool - 16 node pool - 16 heap size - 4 gb

Questions:

  1. Any more approach we can take to increase the ingestion performance, reducing the overall time.

  2. How many multiple connections at a time we can create with Janusgraph via Python Driver. Currently we are able to create 30 threads(connections) but when we increase the number of threads the connection either stuck or performance goes down.

Any details/help will be highly appreciated

Gremlin Query: g.V().hasLabel("http://purl.uniprot.org/core/Helix_Annotation").has("id", "http://purl.uniprot.org/uniprot/P06931#SIPC5C5063B7561AB45").has("node_id", "http://purl.uniprot.org/uniprot/P06931#SIPC5C5063B7561AB45").has("rdf_type", "http://purl.uniprot.org/uniprot/").has("http://purl.uniprot.org/core/range", "http://purl.uniprot.org/range/22571007582875950tt125tt127").fold().coalesce(unfold(), addV("http://purl.uniprot.org/core/Helix_Annotation").property("id", "http://purl.uniprot.org/uniprot/P06931#SIPC5C5063B7561AB45").property("node_id", "http://purl.uniprot.org/uniprot/P06931#SIPC5C5063B7561AB45").property("rdf_type", "http://purl.uniprot.org/uniprot/").property("http://purl.uniprot.org/core/range", "http://purl.uniprot.org/range/22571007582875950tt125tt127")).V().hasLabel("http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement").has("id", "http://purl.uniprot.org/uniprot/#_kb.P06931_up.annotation_FA85AD309172A9A7").has("node_id", "http://purl.uniprot.org/uniprot/#_kb.P06931_up.annotation_FA85AD309172A9A7").has("rdf_type", "http://purl.uniprot.org/uniprot/").has("http://purl.uniprot.org/core/attribution", "http://purl.uniprot.org/uniprot/P06931#attribution-AE0E09C5B47CC2714C9061D3806995B4").fold().coalesce(unfold(), addV("http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement").property("id", "http://purl.uniprot.org/uniprot/#_kb.P06931_up.annotation_FA85AD309172A9A7").property("node_id", "http://purl.uniprot.org/uniprot/#_kb.P06931_up.annotation_FA85AD309172A9A7").property("rdf_type", "http://purl.uniprot.org/uniprot/").property("http://purl.uniprot.org/core/attribution", "http://purl.uniprot.org/uniprot/P06931#attribution-AE0E09C5B47CC2714C9061D3806995B4"))

Upvotes: 4

Views: 174

Answers (1)

Michael Scott
Michael Scott

Reputation: 580

Have you tried using groovy scripts? I was able to insert ~5K nodes in ~10 sec single threaded into a janusgraph that has much lesser RAM and CPU.

Upvotes: 0

Related Questions