G3M
G3M

Reputation: 1031

Compute average of numbers in a text file in spark scala

Lets say I have a file with each line representing a number. How do I find average of all the numbers in the file in Scala - Spark.

val data = sc.textFile("../../numbers.txt")
val sum = data.reduce( (x,y) => x+y )
val avg = sum/data.count()

The problem here is x and y are strings. How do I convert them into Long within the reduce function.

Upvotes: 2

Views: 2651

Answers (1)

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149538

You need to apply a RDD.map which parses the strings before reducing them:

val sum = data.map(_.toInt).reduce(_+_)
val avg = sum / data.count()

But I think you're better off using DoubleRDDFunctions.mean instead of calculating it yourself:

val mean = data.map(_.toInt).mean()

Upvotes: 7

Related Questions