Reputation: 18013
A question that I feel may benefit others.
If I run
val rdd1 = sc.parallelize( List( "a", "b", "c", "d", "e"))
val rdd1a = rdd1.map(x => (x, 110, 110 - x.toByte ))
rdd1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[55] at parallelize at <console>:44
rdd1a: org.apache.spark.rdd.RDD[(String, Int, Int)] = MapPartitionsRDD[56] at map at <console>:46
it works.
As soon as I add collect
val rdd1 = sc.parallelize( List( "a", "b", "c", "d", "e"))
val rdd1a = rdd1.map(x => (x, 110, 110 - x.toByte )).collect()
it fails.
The logic sort of escapes me really. Who can clarify? It is an RDD so?
Upvotes: 0
Views: 1629
Reputation: 23109
The error is here
val rdd1a = rdd1.map(x => (x, 110, 110 - x.toByte ))
Since x
is string
and you are trying to change it to Byte
what you should do is
val rdd1a = rdd1.map(x => (x, 110, 110 - x.toCharArray()(0).toByte ))
This did not failed here
val rdd1a = rdd1.map(x => (x, 110, 110 - x.toByte ))
because this is a lazy evaluation, it is not executed, collect
is an action. After action is performed the code gets executed as well.
Hope this helps
Upvotes: 2