userxxx
userxxx

Reputation: 796

How to sort a text file containing integers in Spark - Scala?

I am new to spark programming, I have a data file named "test1.in" which contains random numbers in following way -

123
34
1
45
65

I want to sort these numbers using spark and write the output to a new file. Here is my code so far -

import org.apache.spark.{SparkContext, SparkConf}

val conf = new SparkConf().setMaster("local[*]").setAppName("SortingApp")
val sc = new SparkContext(conf)

val data = sc.textFile("src/main/resources/test1.in")
val d1 = data.map(_.sorted)
d1.foreach(println _)

The result is not what is expected.

Upvotes: 0

Views: 2656

Answers (2)

Tzach Zohar
Tzach Zohar

Reputation: 37832

When you call:

data.map(_.sorted)

You map each record (which is a String) into it's "sorted" version, which means the String is being converted into a Sequence of chars and these chars are sorted.

What you need to do is NOT to use map which applies your function to each record separately (hence it can't sort the records), but use RDD.sortBy:

data.map(_.toInt).sortBy(t => t)

The t => t is the identity function returning the input as-as, which can be replaced with Scala's built-in generic implementation:

data.map(_.toInt).sortBy(identity)

Or, the shortest version:

input.sortBy(_.toInt)

(which would return a result of type RDD[String])

Upvotes: 2

justAbit
justAbit

Reputation: 4256

Use below line to convert text file data into Int then sort it:

val d1 = data.map(_.toInt).sorted

Upvotes: 0

Related Questions