yrjo
yrjo

Reputation: 103

Selecting multiple arbitrary columns from Scala array using map()

I'm new to Scala (and Spark). I'm trying to read in a csv file and extract multiple arbitrary columns from the data. The following function does this, but with hard-coded column indices:

def readCSV(filename: String, sc: SparkContext): RDD[String] = {
  val input = sc.textFile(filename).map(line => line.split(","))
  val out = input.map(csv => csv(2)+","+csv(4)+","+csv(15))
  return out
}

Is there a way to use map with an arbitrary number of column indices passed to the function in an array?

Upvotes: 1

Views: 899

Answers (1)

Marth
Marth

Reputation: 24812

If you have a sequence of indices, you could map over it and return the values :

scala> val m = List(List(1,2,3), List(4,5,6))
m: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6))

scala> val indices = List(0,2)
indices: List[Int] = List(0, 2)

// For each inner sequence, get the relevant values
// indices.map(inner) is the same as indices.map(i => inner(i))
scala> m.map(inner => indices.map(inner))
res1: List[List[Int]] = List(List(1, 3), List(4, 6))

// If you want to join all of them use .mkString
scala> m.map(inner => indices.map(inner).mkString(","))
res2: List[String] = List(1,3, 4,6)  // that's actually a List containing 2 String

Upvotes: 2

Related Questions