Reputation: 378
I have an my data modeled on Java as Entities and Relationships. Where each Entity has a List of Relationships. And our in coming request can have a List of Entities coming in through the entity-request that need to be created in the GraphDB(using Neptune) and accessing it using gremlin. I have to loop through the list of entities once to create the vertices in the graph and then loop through the Entities again, while lopping through each of the relationships to create the edges according to the relations. This isn't the most elegant way to handle this, so is there a way I can optimize my data model and/or gremlin queries? See code below for reference.
public class EntityRequest{
Set<Entity> entities;
// getter
// builder
// constructors etc
}
public class Entity{
String id;
String entityType;
List<String, Object> attributes;
List<Relationship> relationships;
// getter
// builder
// constructors etc
}
public class Relationship{
String id;
String type;
Map<String, Object> RelationshipMetaData;
}
public EntityCreationServiceImpl{
public void createEntitiesinGraph(EntityRequest request, GraphTraversalSource g){
// any kind of loop
Set<Entity> eSet = request.getEntities();
loop-through-entities(e) -> {
create all vertices using e;
};
// any kind of loop
loop-through-entities(e) -> {
loop-through-list-of-relationships-for-each-entity(r) ->{
create all edges for e;
}
}
}
}
It is working and creating entities in the neptune db but as you can see it is not performance optimized. is there a better way to do this?
Upvotes: 1
Views: 239
Reputation: 1419
For 10k entities, I would use the Neptune bulk loader, which takes csv file from s3, and upload it to Neptune efficiently. In your case the flow would be - serialize the entities to csv, upload to s3, and call the load api.
However, for the usual case of several entries this would probably be an overkill.
Since the DB might have some vertices already, you should use coalesce, to search if the vertex exist or create it otherwise. You can chain the edges creation in the same query and optionally create the edge target vertex if doesn't exist:
g.V().has(foo,bar).fold().coalesce(unfold(),addV(type).property(foo,bar)).as('v')
.addE().from('v').to(V().has(...).fold().coalesce(unfold(),addV(...))
.addE().from('v').to(V().has(...).fold().coalesce(unfold(),addV(...))
This way, you would only iterate the entries once, and execute n queries.
Upvotes: 1