Sangeen Khan
Sangeen Khan

Reputation: 175

how we can sort line by line data from txt file using Apache Spark scala?

The log.txt file contains :

cat,black,dog,apple,red
zoo,apple,red,blue,green
apple,green,zoo,black,walk

My code is :

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD

object ScalaApp {
def main(args: Array[String]) {
val sc = new SparkContext("local[4]", "Program")

val data = sc.textFile("data.txt").flatMap(line=> line.split(","))
val d1=data.map(_.sorted)
d1.foreach(print _)
 }
 }

i want the following result:

  apple,black,cat,dog,red
  apple,blue,green,red,zoo
  apple,black,green,walk,zoo 

but my code give result as :

 actabckldgoaelppderoozaelppderbelueegnraelppeegnroozabcklaklw

kindly provide any solution!

Upvotes: 0

Views: 1621

Answers (4)

Sangeen Khan
Sangeen Khan

Reputation: 175

finally i have solved the problem and want to share with you all too :

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD

object ScalaApp {
def main(args: Array[String]) {
val sc = new SparkContext("local[4]", "Program")
val data = sc.textFile("data.txt").flatMap(_.split("\n"))
    val lc=data.count().toInt
val d1=data.flatMap(line=>line.split(",").sorted)
d1.foreach(println)
val a=d1.toArray
var loop=0
    for(i<-0 to lc-1){
    println(a(loop)+" "+a(loop+1)+" "+a(loop+2)+" "+a(loop+3)+" "+a(loop+4)+" "+a(loop+5))
    loop=loop+6
 }
 }
 }

the data.txt file contains :

cat,black,dog,apple,red,cat
zoo,apple,red,blue,green,cat
apple,green,zoo,black,walk,cat

The result is :

apple black cat cat dog red
apple blue cat green red zoo
apple black cat green walk zoo

And that was the desired output!!

Upvotes: 0

WoodChopper
WoodChopper

Reputation: 4375

You have to first split by line, then by ","

val data = sc.textFile("data.txt")
             .map(word=> word.split(",")).map(_.sorted)
data.collect()

Upvotes: 1

Jason Scott Lenderman
Jason Scott Lenderman

Reputation: 1918

This should create an RDD[Array[String]] where each element of the RDD is an array containing the tokens of a line of text sorted in ascending order:

val data = sc.textFile("log.txt").map(line => line.split(",").sorted)

Also, be aware that if you do a data.foreach(println) the output will go to stdout of the workers, not stdout of the driver.

but my code give result as :

actabckldgoaelppderoozaelppderbelueegnraelppeegnroozabcklaklw

You're getting that mess because data is an RDD[String] (because you're doing a flatMap instead of a map.) So when you do data.map(_.sorted) it is going to sort each String in data, e.g. "apple" will become "aelpp", etc. Use map instead of flatMap.

Upvotes: 0

Shadowlands
Shadowlands

Reputation: 15074

Try changing the line defining d1 to:

val d1=data.map(_.sorted)
d1.foreach(println _)

Upvotes: 1

Related Questions