user3086871
user3086871

Reputation: 721

What is scala's version of ArrayList and Tuple?

I am trying to convert the following code segment

public org.apache.spark.mllib.linalg.Vector call(Tuple2<IntWritable, VectorWritable> arg0)
                        throws Exception {

                    org.apache.mahout.math.Vector mahoutVector = arg0._2.get();
                    Iterator<Element> elements = mahoutVector.nonZeroes().iterator();
                    ArrayList<Tuple2<Integer, Double>> tupleList = new ArrayList<Tuple2<Integer, Double>>();
                    while (elements.hasNext()) {
                        Element e = elements.next();
                        if (e.index() >= nCols || e.get() == 0)
                            continue;
                        Tuple2<Integer, Double> tuple = new Tuple2<Integer, Double>(e.index(), e.get());
                        tupleList.add(tuple);
                    }
                    org.apache.spark.mllib.linalg.Vector sparkVector = Vectors.sparse(nCols, tupleList);
                    return sparkVector;
                }

I am fairly new to scala so I dont know how to properly convert it. So far I got

def transformSvec(x: Vector) : org.apache.spark.mllib.linalg.Vector = {
    val iter=x.nonZeroes.iterator()    
    //iterate the items and add to an arraylist
    //or an iterable/seq for scala, if var seq: Seq[(Int, scala.Double)] is chosen then
    org.apache.spark.mllib.linalg.Vectors.sparse(x.size, seq)
} 

Can anybody help? Thanks in advance.

Upvotes: 2

Views: 529

Answers (2)

rawkintrevo
rawkintrevo

Reputation: 658

In Mahout 0.13.0 you can also use MahoutCollections

import org.apache.mahout.math.scalabindings.MahoutCollections._

val a = Array(1.0, 2.0, 3.0)
val v: Vector = new org.apache.mahout.math.DenseVector(a)

v.toArray

You can pass an array to the constructor of a Spark Vector

Upvotes: 1

puhlen
puhlen

Reputation: 8529

Tuple comes from Scala, not Java. In Scala you can use the proper syntax though (IntWritable, VectorWriteable) is special syntax for the type Tuple2[IntWriteable, VectorWriteable]

You can also instantiate your tuples using this syntax. Your java code

 Tuple2<Integer, Double> tuple = new Tuple2<Integer, Double>(e.index(), e.get());

Becomes

val tuple = (e.index(), e.get())

You can use ArrayList from Scala if you like, nothing will stop you, but it's generally prefered to use the Scala collections as they have more features an work better with the rest of Scala. scala.collection.mutable.ArrayBuffer is the Scala equivalent to java.util.ArrayList.

However, it's not common in Scala to add things to a collection in a loop like you would in Java. Usually you would use immutable collections and methods like map, flatmap, and filter to transform and generate new collections. In your case you can use

val tupleList = x.nonZeroes.iterator()
  .filter(e => e.index < ncols)
  .filter(e => e.get != 0)
  .map(e => (e.index(), e.get))
  .toSeq

To generate your sequence.

Upvotes: 3

Related Questions