vishal kumar
vishal kumar

Reputation: 19

Apache-spark: How to pass Iterable[String] as parameter in a function

I have used groupByKey on key value pair i got output of type [(String, Iterable[String])]

I am calling a function on map transformation on above output but i am getting error on function declaraion.

def getStr (uid : String, locations : Array[]) : String = {
   return "test"
}

I don't know how to use Iterable[String] as function parameter.

Upvotes: 1

Views: 917

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37852

If RDD has type RDD[(String, Iterable[String])], to map its records you'll need a function receiving a single argument with the same type of the RDD's records, i.e.:

def getStr(record: (String, Iterable[String])): String = { "test" }

If you're wondering what can be done with an Iterable - see docs.

Sometimes the simplest approach (though not necessarily best performing) is to convert it to a list, and then you get a richer API, e.g. to get the first item of the iterator, or a default if it's empty:

def getStr(record: (String, Iterable[String])): String = record match { 
  case (s, iter) => iter.toList.headOption.getOrElse("UNKNOWN") 
}

Upvotes: 0

Related Questions