Reputation: 23
I am new to Cassandra Spark and trying to Load data from File to Cassandra Table using Spark master Cluster. I am following the steps given in below link
http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/spark/sparkImportTxtCQL.html
On step no 8 the data is shown into Integer Array but when I am using the same command the result is shown into string Array[Array[String]] = Array(Array(6, 7, 8))
After applying the explicitly conversion method For example
scala> val arr = Array("1", "12", "123")
arr: Array[String] = Array(1, 12, 123)
scala> val intArr = arr.map(_.toInt)
intArr: Array[Int] = Array(1, 12, 123)
the result is showing into this format
res24: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[7] at map at <console>:33
Now After retrieving data from it using take function or applying any function on it, the following errors are occurring
15/09/10 17:21:23 INFO SparkContext: Starting job: take at :36 15/09/10 17:21:23 INFO DAGScheduler: Got job 23 (take at :36) with 1 output partitions (allowLocal=true) 15/09/10 17:21:23 INFO DAGScheduler: Final stage: ResultStage 23(take at :36) 15/09/10 17:21:23 INFO DAGScheduler: Parents of final stage: List() 15/09/10 17:21:23 INFO DAGScheduler: Missing parents: List() 15/09/10 17:21:23 INFO DAGScheduler: Submitting ResultStage 23 (MapPartitionsRDD[7] at map at :33), which has no missing parents 15/09/10 17:21:23 INFO MemoryStore: ensureFreeSpace(3448) called with curMem=411425, maxMem=257918238 15/09/10 17:21:23 INFO MemoryStore: Block broadcast_25 stored as values in memory (estimated size 3.4 KB, free 245.6 MB) 15/09/10 17:21:23 INFO MemoryStore: ensureFreeSpace(2023) called with curMem=414873, maxMem=257918238 15/09/10 17:21:23 INFO MemoryStore: Block broadcast_25_piece0 stored as bytes in memory (estimated size 2023.0 B, free 245.6 MB) 15/09/10 17:21:23 INFO BlockManagerInfo: Added broadcast_25_piece0 in memory on 192.168.1.137:57524 (size: 2023.0 B, free: 245.9 MB) 15/09/10 17:21:23 INFO SparkContext: Created broadcast 25 from broadcast at DAGScheduler.scala:874 15/09/10 17:21:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 23 (MapPartitionsRDD[7] at map at :33) 15/09/10 17:21:23 INFO TaskSchedulerImpl: Adding task set 23.0 with 1 tasks 15/09/10 17:21:23 INFO TaskSetManager: Starting task 0.0 in stage 23.0 (TID 117, 192.168.1.138, PROCESS_LOCAL, 1512 bytes) 15/09/10 17:21:23 INFO BlockManagerInfo: Added broadcast_25_piece0 in memory on 192.168.1.138:34977 (size: 2023.0 B, free: 265.4 MB) 15/09/10 17:21:23 WARN TaskSetManager: Lost task 0.0 in stage 23.0 (TID 117, 192.168.1.138): java.lang.ClassNotFoundException: $line67.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:66) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
15/09/10 17:21:23 INFO TaskSetManager: Starting task 0.1 in stage 23.0 (TID 118, 192.168.1.137, PROCESS_LOCAL, 1512 bytes) 15/09/10 17:21:23 INFO BlockManagerInfo: Added broadcast_25_piece0 in memory on 192.168.1.137:57296 (size: 2023.0 B, free: 265.4 MB) 15/09/10 17:21:23 INFO TaskSetManager: Lost task 0.1 in stage 23.0 (TID 118) on executor 192.168.1.137: java.lang.ClassNotFoundException ($line67.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 1] 15/09/10 17:21:23 INFO TaskSetManager: Starting task 0.2 in stage 23.0 (TID 119, 192.168.1.137, PROCESS_LOCAL, 1512 bytes) 15/09/10 17:21:23 INFO TaskSetManager: Lost task 0.2 in stage 23.0 (TID 119) on executor 192.168.1.137: java.lang.ClassNotFoundException ($line67.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 2] 15/09/10 17:21:23 INFO TaskSetManager: Starting task 0.3 in stage 23.0 (TID 120, 192.168.1.138, PROCESS_LOCAL, 1512 bytes) 15/09/10 17:21:23 INFO TaskSetManager: Lost task 0.3 in stage 23.0 (TID 120) on executor 192.168.1.138: java.lang.ClassNotFoundException ($line67.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 3] 15/09/10 17:21:23 ERROR TaskSetManager: Task 0 in stage 23.0 failed 4 times; aborting job 15/09/10 17:21:23 INFO TaskSchedulerImpl: Removed TaskSet 23.0, whose tasks have all completed, from pool 15/09/10 17:21:23 INFO TaskSchedulerImpl: Cancelling stage 23 15/09/10 17:21:23 INFO DAGScheduler: ResultStage 23 (take at :36) failed in 0.184 s 15/09/10 17:21:23 INFO DAGScheduler: Job 23 failed: take at :36, took 0.194861 s 15/09/10 17:21:23 INFO BlockManagerInfo: Removed broadcast_24_piece0 on 192.168.1.137:57524 in memory (size: 1963.0 B, free: 245.9 MB) 15/09/10 17:21:23 INFO BlockManagerInfo: Removed broadcast_24_piece0 on 192.168.1.138:34977 in memory (size: 1963.0 B, free: 265.4 MB) 15/09/10 17:21:23 INFO BlockManagerInfo: Removed broadcast_24_piece0 on 192.168.1.137:57296 in memory (size: 1963.0 B, free: 265.4 MB) 15/09/10 17:21:23 INFO BlockManagerInfo: Removed broadcast_23_piece0 on 192.168.1.137:57524 in memory (size: 2.2 KB, free: 245.9 MB) 15/09/10 17:21:23 INFO BlockManagerInfo: Removed broadcast_23_piece0 on 192.168.1.138:34977 in memory (size: 2.2 KB, free: 265.4 MB) 15/09/10 17:21:23 INFO BlockManagerInfo: Removed broadcast_23_piece0 on 192.168.1.137:57296 in memory (size: 2.2 KB, free: 265.4 MB) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 23.0 failed 4 times, most recent failure: Lost task 0.3 in stage 23.0 (TID 120, 192.168.1.138): java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:66) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Thanks in advance for help
Upvotes: 1
Views: 434
Reputation: 31
It seems that you doesn't have the connection driver in your Classpath.
Look at this point:
java.lang.ClassNotFoundException:
at java.lang.Class.forName(Class.java:274)
Please, review your project and check if you have the Cassandra Connector in your dependencies.
I hope I've helped.
Upvotes: 1