user896993
user896993

Reputation: 1351

Spark 2.0 Cassandra Scala Shell Error: java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

I have configured spark 2.0 shell to run with datastax cassandra connector.

spark-shell --packages datastax:spark-cassandra-connector:2.0.0-M1-35-s_2.11

When running this snippet in the shell

sc.stop
import org.apache.spark
import org.apache.spark._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.cassandra
import org.apache.spark.sql.cassandra._
import com.datastax.spark
import com.datastax.spark._
import com.datastax.spark.connector
import com.datastax.spark.connector._
import com.datastax.spark.connector.cql
import com.datastax.spark.connector.cql._
import com.datastax.spark.connector.cql.CassandraConnector
import com.datastax.spark.connector.cql.CassandraConnector._

val conf = new SparkConf(true).set("spark.cassandra.connection.host", "dbserver")
conf.set("spark.cores.max", "1")

val sc = new SparkContext("spark://localhost:7077", "test", conf)
val table = sc.cassandraTable("blackwell", "users")
println(table.count)

On this line

println(table.count)

Getting this error java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

[Stage 0:>                                                          (0 + 2) / 6]
16/08/25 11:59:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 0.0.0.0): 
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at com.datastax.spark.connector.util.CountingIterator.<init>(CountingIterator.scala:4)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:336)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

Has anyone seen this issue?

Upvotes: 1

Views: 4112

Answers (2)

user896993
user896993

Reputation: 1351

I finally got this working. I've added a gist for reference.

https://gist.github.com/ghafran/19d0067d88dc074413422d4cae4cc344

Here is the entire script:

# install java
sudo apt-get update -y
sudo apt-get install software-properties-common -y
sudo add-apt-repository -y ppa:openjdk-r/ppa
sudo apt-get install wget -y
sudo apt-get install openjdk-8-jdk -y
sudo apt-get update -y

# make serve directory
sudo mkdir -p /srv
cd /srv

# install scala 2.11
sudo wget http://downloads.lightbend.com/scala/2.11.7/scala-2.11.7.deb
sudo dpkg -i scala-2.11.7.deb

# get spark 2.0
sudo wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz
sudo tar -zxf spark-2.0.0-bin-hadoop2.7.tgz
sudo mv spark-2.0.0-bin-hadoop2.7 spark

# build spark cassandra connector
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823
sudo apt-get install apt-transport-https -y
sudo apt-get update -y
sudo apt-get install sbt -y
git clone https://github.com/datastax/spark-cassandra-connector.git
cd spark-cassandra-connector
git checkout v2.0.0-M2
sudo sbt assembly -Dscala-2.11=true

# move spark cassandra connector to spark jar directory
find . -iname "*.jar" -type f -exec /bin/cp {} /srv/spark/jars/ \;

# start master
/srv/spark/sbin/start-master.sh --host 0.0.0.0

# start slave
/srv/spark/sbin/start-slave.sh --host 0.0.0.0 spark://localhost:7077

# start shell
/srv/spark/sbin/spark-shell --driver-class-path $(echo /srv/spark/jars/*.jar |sed 's/ /:/g')

# test
sc.stop
import org.apache.spark
import org.apache.spark._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.cassandra
import org.apache.spark.sql.cassandra._
import com.datastax.spark
import com.datastax.spark._
import com.datastax.spark.connector
import com.datastax.spark.connector._
import com.datastax.spark.connector.cql
import com.datastax.spark.connector.cql._
import com.datastax.spark.connector.cql.CassandraConnector
import com.datastax.spark.connector.cql.CassandraConnector._

val conf = new SparkConf(true).set("spark.cassandra.connection.host", "cassandraserver")
val sc = new SparkContext("spark://localhost:7077", "test", conf)
val table = sc.cassandraTable("keyspace", "users")
println(table.count)

Upvotes: 1

T. Gawęda
T. Gawęda

Reputation: 16086

Spark 2.0 uses Scala 2.11.

I cannot put a comment, so this answer will be edited after answering additional question :)

I assume that you're running spark-shell located in your computer. Can you please run in shell (system shell, not Spark) command:

scala -version

Additionally, "spark://localhost:7077" in URL looks like you have Spark Standalone launched. Can you please check if that Spark distribution is build with Scala 2.11?

In my opinion, it would be also better to use --master parameter of spark-shell.

Upvotes: 0

Related Questions