user706838
user706838

Reputation: 5380

How to select certain field dynamically in Scala \ Spark?

I am not sure if the title describes my problem accurately but here is my problem:

dump is of type:

dump: org.apache.spark.rdd.RDD[(String, String, String, String)]

for example:

val dump = sc.parallelize(List(("a","b","c","s")))

and I have the following for-loop:

   for (i <- List(0,1,2,3)) {
      val temp = dump.map(x=> x._i)
    }

But IntelliJ indicates there is an error in x._i. Any ideas?

Upvotes: 0

Views: 313

Answers (1)

Bhashit Parikh
Bhashit Parikh

Reputation: 3131

IntelliJ is correct in pointing out that you are using an incorrect syntax.

What you are trying to do, can be achieved using something like:

for (i <- List(0, 1, 2, 3)) {
  val temp = dump.map(x => x.productElement(i))
}

Tuples are actual instances of a class, and they are not exactly an array that you can access using an index. Also, scala, unlike some other languages like JavaScript, doesn't allow string based property access (unless you want to use reflection). What you are trying could work, with some syntactical changes, in a languages like JS but not in Scala.

However, at least in this case, the same thing can be achieved using the productElement method call as each all Tuples are also instances of Product, which does have the facilities to iterate over the elements, or access them via indices. Note that index of 0 equals ._1, and so on.

Also, with reference to the comment by @Archeg, there is a limit to what you can put into tuples. There are tuple classes ranging from Tuple1 to Tuple22. Which means that tuples can contain, at most, 22 elements.

Upvotes: 4

Related Questions