Reputation: 1031
Lets say I have a file with each line representing a number. How do I find average of all the numbers in the file in Scala - Spark.
val data = sc.textFile("../../numbers.txt")
val sum = data.reduce( (x,y) => x+y )
val avg = sum/data.count()
The problem here is x and y are strings. How do I convert them into Long within the reduce function.
Upvotes: 2
Views: 2651
Reputation: 149538
You need to apply a RDD.map
which parses the strings before reducing them:
val sum = data.map(_.toInt).reduce(_+_)
val avg = sum / data.count()
But I think you're better off using DoubleRDDFunctions.mean
instead of calculating it yourself:
val mean = data.map(_.toInt).mean()
Upvotes: 7