Ged
Ged

Reputation: 18013

RDD collect() failure

A question that I feel may benefit others.

If I run

val rdd1  = sc.parallelize( List( "a", "b", "c", "d", "e")) 
val rdd1a = rdd1.map(x => (x, 110, 110 - x.toByte ))

rdd1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[55] at parallelize at <console>:44
rdd1a: org.apache.spark.rdd.RDD[(String, Int, Int)] = MapPartitionsRDD[56] at map at <console>:46

it works.

As soon as I add collect

val rdd1  = sc.parallelize( List( "a", "b", "c", "d", "e")) 
val rdd1a = rdd1.map(x => (x, 110, 110 - x.toByte )).collect()

it fails.

The logic sort of escapes me really. Who can clarify? It is an RDD so?

Upvotes: 0

Views: 1629

Answers (1)

koiralo
koiralo

Reputation: 23109

The error is here

val rdd1a = rdd1.map(x => (x, 110, 110 - x.toByte ))

Since x is string and you are trying to change it to Byte

what you should do is

val rdd1a = rdd1.map(x => (x, 110, 110 - x.toCharArray()(0).toByte ))

This did not failed here

val rdd1a = rdd1.map(x => (x, 110, 110 - x.toByte ))

because this is a lazy evaluation, it is not executed, collect is an action. After action is performed the code gets executed as well.

Hope this helps

Upvotes: 2

Related Questions