Reputation: 2074
I have data in a Key Value pairing. I am trying to apply a filter function to the data that looks like:
def filterNum(x: Int) : Boolean = {
if (decimalArr.contains(x)) return true
else return false
}
My Spark code that has:
val numRDD = columnRDD.filter(x => filterNum(x(0)))
but that wont work and when I send in the:
val numRDD = columnRDD.filter(x => filterNum(x))
I get the error:
<console>:23: error: type mismatch;
found : (Int, String)
required: Int
val numRDD = columnRDD.filter(x => filterNum(x))
I also have tried to do other things like changing the inputs to the function
Upvotes: 1
Views: 12833
Reputation: 67135
This is because RDD.filter
is passing in the Key-Value Tuple, (Int, String)
, and filterNum is expecting an Int
, which is why the first attempt works: tuple(index)
pulls out the value at that index of the tuple.
You could change your filter function to be
def filterNum(x: (Int, String)) : Boolean = {
if (decimalArr.contains(x._1)) return true
else return false
}
Although, I would personally do a more terse version as the false is baked into the contains
and you can just use the expression directly:
columnRDD.filter(decimalArr.contains(_._1))
Or, if you don't like the underscore syntax:
columnRDD.filter(x=>decimalArr.contains(x._1))
Also, do not use return
in scala, the last evaluated line is the return automatically
Upvotes: 8