Reputation: 87
This is my first time using Scala and ApacheSpark for a project. I'm trying to print the contents of an matrix when I run my code in the terminal, but nothing I try is working so far.
Instead I only get this printed:
org.apache.spark.mllib.linalg.distributed.MatrixEntry;@71870da7
org.apache.spark.mllib.linalg.distributed.CoordinateMatrix@1dcca8d3
I just using println()
but when I use collect()
, that doesn't give a good result either.
Upvotes: 3
Views: 5033
Reputation: 71
scala>val rdd1 = sc.parallelize(List(1,2,3,4)).map(_*2)
To print the data within RDD
scala> rdd1.collect().foreach(println)
Output: 2 4 6 8
Upvotes: 0
Reputation: 63062
Building on @zero323 's comment ( aside would you like to put an answer out there?): given an RDD[SomeType] you can call
rdd.collect()
or
rdd.take(k)
Then you can print out the results using normal toString() methods that depend on the type of the rdd contents. So if SomeType
were a List[Double]
then the
println(s"${rdd.collect().mkString(",")}")
would give you a single-line comma separated output of the results.
As @zero323 another consideration is: "do you really want to print out the contents of your rdd?" More likely you might only want a summary - such as
println(s"Number of entries in RDD is ${rdd.count()}")
Upvotes: 1
Reputation: 3072
The default toString prints the name of a class followed by an address in memory.
org.apache.spark.mllib.linalg.distributed.MatrixEntry;@71870da7
You're going to want to find a way to iterate through your matrix and print each element.
Upvotes: 1