Reputation: 175
The log.txt file contains :
cat,black,dog,apple,red
zoo,apple,red,blue,green
apple,green,zoo,black,walk
My code is :
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
object ScalaApp {
def main(args: Array[String]) {
val sc = new SparkContext("local[4]", "Program")
val data = sc.textFile("data.txt").flatMap(line=> line.split(","))
val d1=data.map(_.sorted)
d1.foreach(print _)
}
}
i want the following result:
apple,black,cat,dog,red
apple,blue,green,red,zoo
apple,black,green,walk,zoo
but my code give result as :
actabckldgoaelppderoozaelppderbelueegnraelppeegnroozabcklaklw
kindly provide any solution!
Upvotes: 0
Views: 1621
Reputation: 175
finally i have solved the problem and want to share with you all too :
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
object ScalaApp {
def main(args: Array[String]) {
val sc = new SparkContext("local[4]", "Program")
val data = sc.textFile("data.txt").flatMap(_.split("\n"))
val lc=data.count().toInt
val d1=data.flatMap(line=>line.split(",").sorted)
d1.foreach(println)
val a=d1.toArray
var loop=0
for(i<-0 to lc-1){
println(a(loop)+" "+a(loop+1)+" "+a(loop+2)+" "+a(loop+3)+" "+a(loop+4)+" "+a(loop+5))
loop=loop+6
}
}
}
the data.txt file contains :
cat,black,dog,apple,red,cat
zoo,apple,red,blue,green,cat
apple,green,zoo,black,walk,cat
The result is :
apple black cat cat dog red
apple blue cat green red zoo
apple black cat green walk zoo
And that was the desired output!!
Upvotes: 0
Reputation: 4375
You have to first split by line, then by ","
val data = sc.textFile("data.txt")
.map(word=> word.split(",")).map(_.sorted)
data.collect()
Upvotes: 1
Reputation: 1918
This should create an RDD[Array[String]]
where each element of the RDD
is an array containing the tokens of a line of text sorted in ascending order:
val data = sc.textFile("log.txt").map(line => line.split(",").sorted)
Also, be aware that if you do a data.foreach(println)
the output will go to stdout of the workers, not stdout of the driver.
but my code give result as :
actabckldgoaelppderoozaelppderbelueegnraelppeegnroozabcklaklw
You're getting that mess because data
is an RDD[String]
(because you're doing a flatMap
instead of a map
.) So when you do data.map(_.sorted)
it is going to sort each String
in data
, e.g. "apple" will become "aelpp", etc. Use map
instead of flatMap
.
Upvotes: 0
Reputation: 15074
Try changing the line defining d1
to:
val d1=data.map(_.sorted)
d1.foreach(println _)
Upvotes: 1