Tarnished-Coder
Tarnished-Coder

Reputation: 378

Is there a better way to model my entities and relations for a graph db (using gremlin)?

I have an my data modeled on Java as Entities and Relationships. Where each Entity has a List of Relationships. And our in coming request can have a List of Entities coming in through the entity-request that need to be created in the GraphDB(using Neptune) and accessing it using gremlin. I have to loop through the list of entities once to create the vertices in the graph and then loop through the Entities again, while lopping through each of the relationships to create the edges according to the relations. This isn't the most elegant way to handle this, so is there a way I can optimize my data model and/or gremlin queries? See code below for reference.

public class EntityRequest{
  Set<Entity> entities;
  // getter
  // builder
  // constructors etc
}
public class Entity{
  String id;
  String entityType;
  List<String, Object> attributes;
  List<Relationship> relationships;
  // getter
  // builder
  // constructors etc
}
public class Relationship{
  String id;
  String type;
  Map<String, Object> RelationshipMetaData;
}
public EntityCreationServiceImpl{
  public void createEntitiesinGraph(EntityRequest request, GraphTraversalSource g){

    // any kind of loop
    Set<Entity> eSet = request.getEntities();
    loop-through-entities(e) -> {
      create all vertices using e;
    };

    // any kind of loop
    loop-through-entities(e) -> {
      loop-through-list-of-relationships-for-each-entity(r) ->{
        create all edges for e;
      }

    }

  }
}

It is working and creating entities in the neptune db but as you can see it is not performance optimized. is there a better way to do this?

Upvotes: 1

Views: 239

Answers (1)

Kfir Dadosh
Kfir Dadosh

Reputation: 1419

For 10k entities, I would use the Neptune bulk loader, which takes csv file from s3, and upload it to Neptune efficiently. In your case the flow would be - serialize the entities to csv, upload to s3, and call the load api.

However, for the usual case of several entries this would probably be an overkill.

Since the DB might have some vertices already, you should use coalesce, to search if the vertex exist or create it otherwise. You can chain the edges creation in the same query and optionally create the edge target vertex if doesn't exist:

g.V().has(foo,bar).fold().coalesce(unfold(),addV(type).property(foo,bar)).as('v')
.addE().from('v').to(V().has(...).fold().coalesce(unfold(),addV(...))
.addE().from('v').to(V().has(...).fold().coalesce(unfold(),addV(...))

This way, you would only iterate the entries once, and execute n queries.

Upvotes: 1

Related Questions