theMadKing
theMadKing

Reputation: 2074

Spark Key/Value filter Function

I have data in a Key Value pairing. I am trying to apply a filter function to the data that looks like:

  def filterNum(x: Int) : Boolean = {
    if (decimalArr.contains(x)) return true
    else return false
  }

My Spark code that has:

val numRDD = columnRDD.filter(x => filterNum(x(0)))

but that wont work and when I send in the:

val numRDD = columnRDD.filter(x => filterNum(x))

I get the error:

<console>:23: error: type mismatch;
 found   : (Int, String)
 required: Int
       val numRDD = columnRDD.filter(x => filterNum(x))

I also have tried to do other things like changing the inputs to the function

Upvotes: 1

Views: 12833

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67135

This is because RDD.filter is passing in the Key-Value Tuple, (Int, String), and filterNum is expecting an Int, which is why the first attempt works: tuple(index) pulls out the value at that index of the tuple.

You could change your filter function to be

def filterNum(x: (Int, String)) : Boolean = {
  if (decimalArr.contains(x._1)) return true
  else return false
}

Although, I would personally do a more terse version as the false is baked into the contains and you can just use the expression directly:

columnRDD.filter(decimalArr.contains(_._1))

Or, if you don't like the underscore syntax:

columnRDD.filter(x=>decimalArr.contains(x._1))

Also, do not use return in scala, the last evaluated line is the return automatically

Upvotes: 8

Related Questions