Reputation: 4582
In the Dgraph paper, it is mentioned that entities are sharded randomly across servers and they carry relationships with other entities with themselves. This results in a high fan-out result set in intermediate steps of a graph query causing broadcasting on servers and high latency to perform joins.
I do not understand the meaning of fan-out here.
I tried understanding from the basics and this is what I found:
I am not certain how the above to points relate to an entity based graph as discussed in the Dgraph paper. Is it the case that entities have many relationships with other entities and so, there are many edges attached to one entity?
Upvotes: 0
Views: 1145
Reputation: 739
In short: Yes, there can be cases where an entity can have many edges attached to it.
In the first paper, shards are based on the predicate "lives-in" and "eats". In this description all of the "lives-in" data is in one shard and all of the "eats" data is in another shard and the paper says: "Assuming the worst case scenario where the cluster is so big each shard lives on a separate server."
Now, consider a query that covers several predicates, each in its own shard, and each shard living on a different server.
Find any person
that lives-in ???,
that eats ???,
that drinks ???,
that drives ???,
that reads ???
As the number of predicates grows, (i.e. the number of interrogated shards), we see the high fan-out described by the paper. The number of cluster "broadcasts" grows significantly, causing the query latency to grow proportionally.
Upvotes: 0