Chandra
Chandra

Reputation: 199

spark Cassandra tuning

How to set following Cassandra write parameters in spark scala code for version - DataStax Spark Cassandra Connector 1.6.3.

Spark version - 1.6.2

spark.cassandra.output.batch.size.rows

spark.cassandra.output.concurrent.writes

spark.cassandra.output.batch.size.bytes

spark.cassandra.output.batch.grouping.key

Thanks, Chandra

Upvotes: 0

Views: 1281

Answers (2)

Christophe Schmitz
Christophe Schmitz

Reputation: 2996

The most flexible way is to add those variables in a file, such as spark.conf:

spark.cassandra.output.concurrent.writes 10

etc... and then create your spark context in your app with something like:

val conf = new SparkConf()
val sc = new SparkContext(conf)

and finally, when you submit your app, you can specify your properties file with:

spark-submit --properties-file spark.conf ...

Spark will automatically read your configuration from spark.conf when creating the spark context That way, you can modify the properties on your spark.conf without needing to recompile your code each time.

Upvotes: 0

suj1th
suj1th

Reputation: 1801

In DataStax Spark Cassandra Connector 1.6.X, you can pass these parameters as part of your SparkConf.

val conf = new SparkConf(true)
    .set("spark.cassandra.connection.host", "192.168.123.10")
    .set("spark.cassandra.auth.username", "cassandra")            
    .set("spark.cassandra.auth.password", "cassandra")
    .set("spark.cassandra.output.batch.size.rows", "100")            
    .set("spark.cassandra.output.concurrent.writes", "100")
    .set("spark.cassandra.output.batch.size.bytes", "100")            
    .set("spark.cassandra.output.batch.grouping.key", "partition")

val sc = new SparkContext("spark://192.168.123.10:7077", "test", conf)

You can refer to this readme for more information.

Upvotes: 3

Related Questions