Is there a better way to model my entities and relations for a graph db (using gremlin)?

Question

I have an my data modeled on Java as Entities and Relationships. Where each Entity has a List of Relationships. And our in coming request can have a List of Entities coming in through the entity-request that need to be created in the GraphDB(using Neptune) and accessing it using gremlin. I have to loop through the list of entities once to create the vertices in the graph and then loop through the Entities again, while lopping through each of the relationships to create the edges according to the relations. This isn't the most elegant way to handle this, so is there a way I can optimize my data model and/or gremlin queries? See code below for reference.

public class EntityRequest{
  Set entities;
  // getter
  // builder
  // constructors etc
}

public class Entity{
  String id;
  String entityType;
  List attributes;
  List relationships;
  // getter
  // builder
  // constructors etc
}

public class Relationship{
  String id;
  String type;
  Map RelationshipMetaData;
}

public EntityCreationServiceImpl{
  public void createEntitiesinGraph(EntityRequest request, GraphTraversalSource g){

    // any kind of loop
    Set eSet = request.getEntities();
    loop-through-entities(e) -> {
      create all vertices using e;
    };

    // any kind of loop
    loop-through-entities(e) -> {
      loop-through-list-of-relationships-for-each-entity(r) ->{
        create all edges for e;
      }

    }

  }
}

It is working and creating entities in the neptune db but as you can see it is not performance optimized. is there a better way to do this?

Kfir Dadosh · Accepted Answer

For 10k entities, I would use the Neptune bulk loader, which takes csv file from s3, and upload it to Neptune efficiently. In your case the flow would be - serialize the entities to csv, upload to s3, and call the load api.

However, for the usual case of several entries this would probably be an overkill.

Since the DB might have some vertices already, you should use coalesce, to search if the vertex exist or create it otherwise. You can chain the edges creation in the same query and optionally create the edge target vertex if doesn't exist:

g.V().has(foo,bar).fold().coalesce(unfold(),addV(type).property(foo,bar)).as('v')
.addE().from('v').to(V().has(...).fold().coalesce(unfold(),addV(...))
.addE().from('v').to(V().has(...).fold().coalesce(unfold(),addV(...))

This way, you would only iterate the entries once, and execute n queries.

Is there a better way to model my entities and relations for a graph db (using gremlin)?

Answers (1)

Related Questions