Reputation: 721
I am trying to convert the following code segment
public org.apache.spark.mllib.linalg.Vector call(Tuple2<IntWritable, VectorWritable> arg0)
throws Exception {
org.apache.mahout.math.Vector mahoutVector = arg0._2.get();
Iterator<Element> elements = mahoutVector.nonZeroes().iterator();
ArrayList<Tuple2<Integer, Double>> tupleList = new ArrayList<Tuple2<Integer, Double>>();
while (elements.hasNext()) {
Element e = elements.next();
if (e.index() >= nCols || e.get() == 0)
continue;
Tuple2<Integer, Double> tuple = new Tuple2<Integer, Double>(e.index(), e.get());
tupleList.add(tuple);
}
org.apache.spark.mllib.linalg.Vector sparkVector = Vectors.sparse(nCols, tupleList);
return sparkVector;
}
I am fairly new to scala so I dont know how to properly convert it. So far I got
def transformSvec(x: Vector) : org.apache.spark.mllib.linalg.Vector = {
val iter=x.nonZeroes.iterator()
//iterate the items and add to an arraylist
//or an iterable/seq for scala, if var seq: Seq[(Int, scala.Double)] is chosen then
org.apache.spark.mllib.linalg.Vectors.sparse(x.size, seq)
}
Can anybody help? Thanks in advance.
Upvotes: 2
Views: 529
Reputation: 658
In Mahout 0.13.0 you can also use MahoutCollections
import org.apache.mahout.math.scalabindings.MahoutCollections._
val a = Array(1.0, 2.0, 3.0)
val v: Vector = new org.apache.mahout.math.DenseVector(a)
v.toArray
You can pass an array to the constructor of a Spark Vector
Upvotes: 1
Reputation: 8529
Tuple comes from Scala, not Java. In Scala you can use the proper syntax though (IntWritable, VectorWriteable)
is special syntax for the type Tuple2[IntWriteable, VectorWriteable]
You can also instantiate your tuples using this syntax. Your java code
Tuple2<Integer, Double> tuple = new Tuple2<Integer, Double>(e.index(), e.get());
Becomes
val tuple = (e.index(), e.get())
You can use ArrayList from Scala if you like, nothing will stop you, but it's generally prefered to use the Scala collections as they have more features an work better with the rest of Scala. scala.collection.mutable.ArrayBuffer
is the Scala equivalent to java.util.ArrayList
.
However, it's not common in Scala to add things to a collection in a loop like you would in Java. Usually you would use immutable collections and methods like map
, flatmap
, and filter
to transform and generate new collections. In your case you can use
val tupleList = x.nonZeroes.iterator()
.filter(e => e.index < ncols)
.filter(e => e.get != 0)
.map(e => (e.index(), e.get))
.toSeq
To generate your sequence.
Upvotes: 3