Reputation: 7247
I would like to convert the following for loop into a functional Scala method.
for (i <- 15 to 25){
count_table_rdd = count_table_rdd.union(training_data.map(line => (i+"_"+line(i)+"_"+line(0), 1)).reduceByKey(_ + _))
}
I have tried look at the foreach method, but I do not want to transform every item, just 15 through 25.
Upvotes: 1
Views: 292
Reputation: 37435
Taking this from the Spark perspective, it could be better to do this by transforming the trainingDataRDD instead of looping to select given columns.
Something like:
trainingData.flatMap(line => (15 to 25).map(i => (i+"_"+line(i)+"_"+line(0), 1)))
.reduceByKey(_ + _)
This will be more efficient that joining pieces of an RDD together using union
.
Upvotes: 1
Reputation: 2928
You may use tailrec
too but @rex's method is what you should be following.
It will not compile, specify Type of your count_table_rdd
and res
accordingly
tailrec version :
@annotation.tailrec
def f(start: Int = 15, end: Int = 25,res:List[Your_count_table_rdd_Type]=Nil): List[Your_count_table_rdd_Type] = {
if (start > end) count_table_rdd
else {
val temp = res ++ training_data.map(line => (start + "_" + line(start) + "_" + line(0), 1)).reduceByKey(_ + _)
f(start + 1, end,temp)
}
}
f()
you can specify start and end too.
f(30,45)
Upvotes: 1
Reputation: 167921
You can fold.
val result = (count_table_rdd /: (15 to 25)){ (c, i) => c.union(...) }
If you see that you've got a set of data and you're pushing a value through it doing updates to that value, you should reach for a fold because that's exactly what it does.
Upvotes: 3