Reputation: 171
I have stucked with a problem. When i write sample cassandra connection code while import cassandra connector gives error.
i am starting the script like below code (both of them gave error)
./spark-submit --jars spark-cassandra-connector_2.11-1.6.0-M1.jar /home/beyhan/sparkCassandra.py
./spark-submit --jars spark-cassandra-connector_2.10-1.6.0.jar /home/beyhan/sparkCassandra.py
But giving below error while
import pyspark_cassandra
ImportError: No module named pyspark_cassandra
Which part i did wrong ?
Note:I have already installed cassandra database.
Upvotes: 1
Views: 1805
Reputation: 3939
You are mixing up DataStax' Spark Cassandra Connector (in the jar you add to spark submit), and TargetHolding's PySpark Cassandra project (which has the pyspark_cassandra
module). The latter is deprecated, so you should probably use the Spark Cassandra Connector. Documention for this package can be found here.
To use it, you can add the following flags to spark submit:
--conf spark.cassandra.connection.host=127.0.0.1 \
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M3
Of course use the IP address on which Cassandra is listening, and check what connector version you need to use: 2.0.0-M3 is the latest version and works with Spark 2.0 and most Cassandra versions. See the compatibility table in case you are using a different version of Spark. 2.10
or 2.11
is the version of Scala your Spark version is built with. If you use Spark 2, by default it is 2.11, before 2.x it was version 2.10.
Then the nicest way to work with the connector is to use it to read dataframes, which looks like this:
sqlContext.read\
.format("org.apache.spark.sql.cassandra")\
.options(table="kv", keyspace="test")\
.load().show()
See the PySpark with DataFrames documentation for more details
Upvotes: 1