qin.sun
qin.sun

Reputation: 73

How to generate random vector in Spark

I want to generate random vectors with norm 1 in Spark.

Since the vector could be very large, I want it to be distributed, And since data in RDD has no order, I want to store the vector in the form of RDD[(Int, Double)], because I also need to use this vector to do some matrix-vector multiplication.

So how could I generate this kind of vector?

Here is my plan for now:

val v = normalRDD(sc, n, NUM_NODE)
val mod = GetMod(v)       // Get the modularity of v
val res = v.map(x => x / mod)
val arr:Array[Double] = res.toArray()

var tuples = new List[(Int, Double)]()
for (i <- 0 to (arr.length - 1)) {
  tuples = (i, arr(i)) :: tuples
}
// Get the entries and length of the vector.
entries = sc.parallelize(tuples)
length = arr.length

I think it not elegant enough because it goes through a "distributed -> single node -> distributed" process.

Is there any way better? Thanks:D

Upvotes: 0

Views: 1156

Answers (2)

Rami
Rami

Reputation: 8314

You can use this function to generate a random vector, then you can normalise it by dividing each element on the sum() of the vector, or by using a normalizer.

Upvotes: 0

jtitusj
jtitusj

Reputation: 3084

try this:

import scala.util.Random
import scala.math.sqrt

val n = 5 // insert length of your array here
val randomRDD = sc.parallelize(for (i <- 0 to n) yield (i, Random.nextDouble))
val norm = sqrt(randomRDD.map(x => x._2 * x._2).sum())
val finalRDD = randomRDD.mapValues(x => x/norm)

Upvotes: 1

Related Questions