Reputation: 199
How to set following Cassandra write parameters in spark scala code for version - DataStax Spark Cassandra Connector 1.6.3.
Spark version - 1.6.2
spark.cassandra.output.batch.size.rows
spark.cassandra.output.concurrent.writes
spark.cassandra.output.batch.size.bytes
spark.cassandra.output.batch.grouping.key
Thanks, Chandra
Upvotes: 0
Views: 1281
Reputation: 2996
The most flexible way is to add those variables in a file, such as spark.conf:
spark.cassandra.output.concurrent.writes 10
etc... and then create your spark context in your app with something like:
val conf = new SparkConf()
val sc = new SparkContext(conf)
and finally, when you submit your app, you can specify your properties file with:
spark-submit --properties-file spark.conf ...
Spark will automatically read your configuration from spark.conf when creating the spark context That way, you can modify the properties on your spark.conf without needing to recompile your code each time.
Upvotes: 0
Reputation: 1801
In DataStax Spark Cassandra Connector 1.6.X, you can pass these parameters as part of your SparkConf
.
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "192.168.123.10")
.set("spark.cassandra.auth.username", "cassandra")
.set("spark.cassandra.auth.password", "cassandra")
.set("spark.cassandra.output.batch.size.rows", "100")
.set("spark.cassandra.output.concurrent.writes", "100")
.set("spark.cassandra.output.batch.size.bytes", "100")
.set("spark.cassandra.output.batch.grouping.key", "partition")
val sc = new SparkContext("spark://192.168.123.10:7077", "test", conf)
You can refer to this readme for more information.
Upvotes: 3