Sumit Pal
Sumit Pal

Reputation: 443

GraphX Explanation

I have a couple of fundamental questions related to GraphX on Spark Is there a resource that can help me understand how GraphX works under the covers in terms of - how is parallelism done - how is the graph partitioned - can any graph algorithm be implemented in GraphX or are there only specific problems that can be implemented - for example - for Bipartite Graphs - can we write a matching algorithm using Path Augmentation etc

Any help would be very appreciated

Upvotes: 0

Views: 519

Answers (1)

Sumit Pal
Sumit Pal

Reputation: 443

( Answer was provided to me by - Michal Malak - author of upcoming book - GraphX in Action - Manning Press )

These are great questions, and ones I should make sure are addressed in the book

Three major caveats to GraphX: 1. It's graph processing, not a graph database (this one is already mentioned in the book) 2. It's suited for massively parallel vertex-to-vertex communications in a SIMD-style execution model. It is not suited for classic graph algorithms, which is why the implementations in chapter 6 are not a great fit for GraphX 3. The dirty little secret is that although there is API control to partition the vertices (PartitionStrategy), edges are always randomly partitioned. Worst of all, edges and vertices are partitioned independently, so all opportunity for data locality is lost.

There is, however, a slightly unexpected optimization intrinsic to GraphX internals, and that is that each edge has routing information to the vertices.

Upvotes: 3

Related Questions