user1050325
user1050325

Reputation: 1272

iterative lookup from within rdd.map in scala

def retrieveindex (stringlist: List[String], lookuplist: List[String]) = 
  stringlist.foreach(y => lookuplist.indexOf(y))

is my function.

I am trying to use this within an rdd like this:

val libsvm = libsvmlabel.map(x => 
  Array(x._2._2,retrieveindex(x._2._1.toList,featureSet.toList)))

However, I am getting an output that is empty. There is no error, but the output from retrieveindex is empty. When I use println to see if I am retrieving correctly, I do see the indices printed. Is there any way to do this? Should I first 'distribute' the function to all the workers? I am a newbie.

Upvotes: 0

Views: 70

Answers (1)

Jean Logeart
Jean Logeart

Reputation: 53839

retrieveindex has a return type of type Unit (because of foreach which just applies a function (String) ⇒ Unit on each element). Therefore it does not map to anything.

You probably want it to return the list of indices, like:

def retrieveindex(stringlist: List[String], lookuplist: List[String]): List[Int] = 
  stringlist.map(y => lookuplist.indexOf(y))

Upvotes: 3

Related Questions