SarahB
SarahB

Reputation: 328

Apache Spark in Scala not printing rdd values

I am new to Spark and Scala as well, so this might be a very basic question.

I created a text file with 4 lines of some words. The rest of the code is as below:

val data = sc.textFile("file:///home//test.txt").map(x=> x.split(" "))

println(data.collect)
println(data.take(2))
println(data.collect.foreach(println))

All the above "println" commands are producing output as: [Ljava.lang.String;@1ebec410

Any idea how do I display the actual contents of the rdd, I have even tried "saveAstextfile", it also save the same line as java...

I am using Intellij IDE for spark scala and yes, I have gone through other posts related to this, but no help. Thanking you in advance

Upvotes: 1

Views: 4002

Answers (2)

Akash Sethi
Akash Sethi

Reputation: 2294

The final return type of RDD is RDD[Array[String]] Previously you were printing the Array[String] that prints something like this [Ljava.lang.String;@1ebec410) Because the toString() method of Array is not overridden so it is just printing the HASHCODE of object

You can try casting Array[String] to List[String] by using implicit method toList now you will be able to see the content inside the list because toString() method of list in scala in overridden and shows the content

That Means if you try

data.collect.foreach(arr => println(arr.toList))

this will show you the content or as @Raphael has suggested data.collect().foreach(arr => println(arr.mkString(", "))) this will also work because arr.mkString(", ")will convert the array into String and Each element Seperated by ,

Hope this clears you doubt Thanks

Upvotes: 2

Raphael Roth
Raphael Roth

Reputation: 27383

data is of type RDD[Array[String]], what you print is the toString of the Array[String] ( [Ljava.lang.String;@1ebec410), try this:

data.collect().foreach(arr => println(arr.mkString(", ")))

Upvotes: 0

Related Questions