Reputation: 1
I'm working airflow into the Docker container on WindowsPC. I have some problems with apache airflow spark submit operator. I want to write data to a remote Cassandra server.
When I was using df.write.save()
I was getting An error occured while calling o41.save.
The strange thing is, that I can read data and show schema, but I can't save the data. Is there any opinion about it?
And also I want to share my spark configuration;
spark = SparkSession \
.builder \
.master ("local[*]")
.appName("example")
.config("spark.cassandra.connection.host","10.0.0.1") \
.config("spark.cassandra.connection.port","9042") \
.config("spark.cassandra.auth.username","pc1") \
.config("spark.cassandra.auth.password","1234") \
.config("spark.jars.packages","/opt/spark/jars/spark-cassandra-connector_2.12-3.4.0.jars") \
.getOrCreate()
I want to write CSV data from Docker to the remote Cassandra server with airflow and spark process.
Upvotes: 0
Views: 60
Reputation: 16293
You didn't provide (1) the full error message with (2) the accompanying full stack trace and (3) a minimal sample code that replicates the issue so I'm going to assume that there are missing dependencies in your app.
I should note that since Spark 2.5, there are optimisations (CassandraSparkExtensions
) introduced in the Spark Cassandra connector which you should include in your app.
Instead of building with the connector JAR, I suggest specifying the package coordinates for the Spark Cassandra connector with the --packages
option so that all dependencies are included in your app.
For example, try launching a PySpark shell with:
$ bin/pyspark \
--master <spark_master_url> \
--spark.cassandra.connection.host=cassandra_host_ip \
--packages com.datastax.spark:spark-cassandra-connector_2.12:3.4.0
--conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions
You can then test your code to see if you can write to Cassandra. Cheers!
Upvotes: 1