Romeo Kienzler
Romeo Kienzler

Reputation: 3539

How to create a VertexId in Apache Spark GraphX using a Long data type?

I'm trying to create a Graph using some Google Web Graph data which can be found here:

https://snap.stanford.edu/data/web-Google.html

import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD



val textFile = sc.textFile("hdfs://n018-data.hursley.ibm.com/user/romeo/web-Google.txt")
val arrayForm = textFile.filter(_.charAt(0)!='#').map(_.split("\\s+")).cache()
val nodes = arrayForm.flatMap(array => array).distinct().map(_.toLong)
val edges = arrayForm.map(line => Edge(line(0).toLong,line(1).toLong))

val graph = Graph(nodes,edges)

Unfortunately, I get this error:

<console>:27: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[Long]
 required: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId, ?)]
Error occurred in an application involving default arguments.
       val graph = Graph(nodes,edges)

So how can I create a VertexId object? For my understanding it should be sufficient to pass a Long.

Any ideas?

Thanks a lot!

romeo

Upvotes: 5

Views: 6557

Answers (2)

zero323
zero323

Reputation: 330113

Not exactly. If you take a look at the signature of the apply method of the Graph object you'll see something like this (for a full signature see API docs):

apply[VD, ED](
    vertices: RDD[(VertexId, VD)], edges: RDD[Edge[ED]], defaultVertexAttr: VD)

As you can read in a description:

Construct a graph from a collection of vertices and edges with attributes.

Because of that you cannot simply pass RDD[Long] as a vertices argument ( RDD[Edge[Nothing]] as edges won't work either).

import scala.{Option, None}

val nodes: RDD[(VertexId, Option[String])] = arrayForm.
    flatMap(array => array).
    map((_.toLong, None))

val edges: RDD[Edge[String]] = arrayForm.
    map(line => Edge(line(0).toLong, line(1).toLong, ""))

Note that:

Duplicate vertices are picked arbitrarily

so .distinct() on nodes is obsolete in this case.

If you want to create a Graph without attributes you can use Graph.fromEdgeTuples.

Upvotes: 4

Nikita
Nikita

Reputation: 4515

The error message said that nodes must be type of RDD[(Long, anything else)]. The first element in tuple is vertexId and the second element could anything, for example, String with node description. Try to simply repeat vertexId:

val nodes = arrayForm
             .flatMap(array => array)
             .distinct()
             .map(x =>(x.toLong, x.toLong))

Upvotes: 2

Related Questions