zehra özdemir
zehra özdemir

Reputation: 1

How To Use Spark Submit Operator With Cassandra Remote Server In Apache Airflow

I'm working airflow into the Docker container on WindowsPC. I have some problems with apache airflow spark submit operator. I want to write data to a remote Cassandra server.
When I was using df.write.save() I was getting An error occured while calling o41.save.
The strange thing is, that I can read data and show schema, but I can't save the data. Is there any opinion about it?

And also I want to share my spark configuration;

spark = SparkSession \
        .builder \
        .master ("local[*]")
        .appName("example")
        .config("spark.cassandra.connection.host","10.0.0.1") \
        .config("spark.cassandra.connection.port","9042") \
        .config("spark.cassandra.auth.username","pc1") \
        .config("spark.cassandra.auth.password","1234") \
        .config("spark.jars.packages","/opt/spark/jars/spark-cassandra-connector_2.12-3.4.0.jars") \
        .getOrCreate()

I want to write CSV data from Docker to the remote Cassandra server with airflow and spark process.

Upvotes: 0

Views: 60

Answers (1)

Erick Ramirez
Erick Ramirez

Reputation: 16293

You didn't provide (1) the full error message with (2) the accompanying full stack trace and (3) a minimal sample code that replicates the issue so I'm going to assume that there are missing dependencies in your app.

I should note that since Spark 2.5, there are optimisations (CassandraSparkExtensions) introduced in the Spark Cassandra connector which you should include in your app.

Instead of building with the connector JAR, I suggest specifying the package coordinates for the Spark Cassandra connector with the --packages option so that all dependencies are included in your app.

For example, try launching a PySpark shell with:

$ bin/pyspark \
  --master <spark_master_url> \
  --spark.cassandra.connection.host=cassandra_host_ip \
  --packages com.datastax.spark:spark-cassandra-connector_2.12:3.4.0
  --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions

You can then test your code to see if you can write to Cassandra. Cheers!

Upvotes: 1

Related Questions