Reputation: 796
I am new to spark programming, I have a data file named "test1.in" which contains random numbers in following way -
123
34
1
45
65
I want to sort these numbers using spark and write the output to a new file. Here is my code so far -
import org.apache.spark.{SparkContext, SparkConf}
val conf = new SparkConf().setMaster("local[*]").setAppName("SortingApp")
val sc = new SparkContext(conf)
val data = sc.textFile("src/main/resources/test1.in")
val d1 = data.map(_.sorted)
d1.foreach(println _)
The result is not what is expected.
Upvotes: 0
Views: 2656
Reputation: 37832
When you call:
data.map(_.sorted)
You map each record (which is a String) into it's "sorted" version, which means the String is being converted into a Sequence of chars
and these chars are sorted.
What you need to do is NOT to use map
which applies your function to each record separately (hence it can't sort the records), but use RDD.sortBy
:
data.map(_.toInt).sortBy(t => t)
The t => t
is the identity function returning the input as-as, which can be replaced with Scala's built-in generic implementation:
data.map(_.toInt).sortBy(identity)
Or, the shortest version:
input.sortBy(_.toInt)
(which would return a result of type RDD[String]
)
Upvotes: 2
Reputation: 4256
Use below line to convert text file data into Int
then sort it:
val d1 = data.map(_.toInt).sorted
Upvotes: 0