Reputation: 5265
I am looking to quickly insert multiple vertices using Azure Cosmos DB Graph-API. Most of the current Microsoft samples create the vertices one by one and execute a Gremlin query for each, like so:
IDocumentQuery<dynamic> query = client.CreateGremlinQuery<dynamic>(graph, "g.addV('person').property('id', 'thomas').property('name', 'Thomas').property('age', 44)");
while (query.HasMoreResults)
{
foreach (dynamic result in await query.ExecuteNextAsync()) {
Console.WriteLine($"\t {JsonConvert.SerializeObject(result)}");
}
Console.WriteLine();
}
query = client.CreateGremlinQuery<dynamic>(graph, "g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39)");
while (query.HasMoreResults)
{
foreach (dynamic result in await query.ExecuteNextAsync()) {
Console.WriteLine($"\t {JsonConvert.SerializeObject(result)}");
}
Console.WriteLine();
}
However this is less than ideal when I want to create a couple thousand vertices and edges to initially populate the graph as this can take some time.
This is with Microsoft.Azure.Graphs library v0.2.0-preview
How can I efficiently add multiple vertices at once to Cosmos DB so I may later query using the Graph API syntax?
Upvotes: 6
Views: 4203
Reputation: 609
We needed a tool to help us migrate data to cosmosdb graph but since nothing was available i ended up creating this - https://github.com/microsoft/migratetograph
You can use this to take data from some sql or json, transform it and push it to graph database.
It supports parallel execution of gremlin queries, so it is considerably fast.
By default, it fires 10 gremlin queries parallelly, but you can increase it by passing batchSize in graph-config file
Upvotes: 1
Reputation: 1
Im using this code to upsert multi Vertex by NodeJS
const __ = gremlin.process.statics;
let trt = await g.withBulk(true).V('test-3').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 4), __.addV('truongtest').property(gremlin.process.t.id, 'test-3').property(gremlin.process.cardinality.single, 'runways', 4))
.V('test-10').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 100), __.addV('truongtest').property(gremlin.process.t.id, 'test-10').property(gremlin.process.cardinality.single, 'runways', 100))
.next()
// if you wanna add alot , using loop
let trt = await g.withBulk(true)
trt = trt.V('test-3').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 4), __.addV('truongtest').property(gremlin.process.t.id, 'test-3').property(gremlin.process.cardinality.single, 'runways', 4))
trt = trt.V('test-10').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 100), __.addV('truongtest').property(gremlin.process.t.id, 'test-10').property(gremlin.process.cardinality.single, 'runways', 100))
// after done run next()
trt.next()
Upvotes: 0
Reputation: 332
The Data Migration Tool may support SQL API or MongoDB scenarios, though it DOES NOT support graph api Vertex - Edges right out of the box at this stage. As mentioned earlier, you could probably use a generated graph query result as main reference pattern then do some Search and Replace... on your source to end up with proper format... though I found simply running a console application streaming data may be more adequate. I was able to reuse the same console app with Marvel as well as Airport flights scenarios and all I needed to do was modify a couple of lines of code each time. Code is run in 2 sequences. First block extracts and converts the Vertices. Second sequence extracts and converts fields relationships as Edges. All I needed to modify was the fields I need to extract. This may take a bit of time to convert depending on size of data though it gave me the exact expected results each time without having to constantly modify data at the source 😉.
Upvotes: 0
Reputation: 21197
I've found that the fastest way to seed your graph is actually to use the Document API. Utilizing this technique I've been able to insert 5500+ vertices/edges per second on a single development machine. The trick is to understand the format that Cosmos expects for both edges and vertices. Just add a couple vertices and edges to your graph through the gremlin API and then inspect the format of these documents by going to the Data Explorer in Azure and executing a document query to SELECT * FROM c
.
At work I've built up a light ORM that uses reflection to take POCOs for edges and vertices and convert them to the format that you see in the portal. I'm hoping to open source this soon, at which point I'll most likely release a Nuget package and accompanying blog post. Hopefully in the meantime this will help point you in the right direction, let me know if you have more questions about this approach.
Upvotes: 6
Reputation: 506
Assuming CosmosDB is 100% TinkerPop compliant and depending on the gremlin executor timeout setting, you should be able to update your gremlin script to do several operations at one time.
For example:
g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39)
can be transformed into:
g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39); g.addV('person').property('id', 'david').property('name', 'David').property('lastName', 'P').property('age', 24);
and etc etc.
Your gremlin script is also just Groovy code, so you could also even write loops and what not to be able to create vertices, append properties, etc.
Upvotes: 1