Catherine
Catherine

Reputation: 87

How do I print the contents of an ApacheSpark RDD in my terminal?

This is my first time using Scala and ApacheSpark for a project. I'm trying to print the contents of an matrix when I run my code in the terminal, but nothing I try is working so far.

Instead I only get this printed:

org.apache.spark.mllib.linalg.distributed.MatrixEntry;@71870da7
org.apache.spark.mllib.linalg.distributed.CoordinateMatrix@1dcca8d3

I just using println() but when I use collect(), that doesn't give a good result either.

Upvotes: 3

Views: 5033

Answers (4)

kuldeep singh
kuldeep singh

Reputation: 71

scala>val rdd1 = sc.parallelize(List(1,2,3,4)).map(_*2)

To print the data within RDD

scala> rdd1.collect().foreach(println)

Output: 2 4 6 8

Upvotes: 0

elm
elm

Reputation: 20415

Iterate over the rdd like this,

rdd.foreach(println)

Upvotes: 0

WestCoastProjects
WestCoastProjects

Reputation: 63062

Building on @zero323 's comment ( aside would you like to put an answer out there?): given an RDD[SomeType] you can call

 rdd.collect()

or

 rdd.take(k)

Then you can print out the results using normal toString() methods that depend on the type of the rdd contents. So if SomeType were a List[Double] then the

println(s"${rdd.collect().mkString(",")}") 

would give you a single-line comma separated output of the results.

As @zero323 another consideration is: "do you really want to print out the contents of your rdd?" More likely you might only want a summary - such as

println(s"Number of entries in RDD is ${rdd.count()}")

Upvotes: 1

Michael Lafayette
Michael Lafayette

Reputation: 3072

The default toString prints the name of a class followed by an address in memory.

org.apache.spark.mllib.linalg.distributed.MatrixEntry;@71870da7

You're going to want to find a way to iterate through your matrix and print each element.

Upvotes: 1

Related Questions