Reputation: 139
I have a Cassandra table of few columns and I want to update one of those(and also what for multiple columns?) from Spark 2.4.0. But if I don't provide all the columns then records are not getting updated.
Cassandra schema:
rowkey,message,number,timestamp,name 1,hello,12345,12233454,ABC
The point is Spark DataFrame
consists the rowkey
with the updated timestamp that has to be updated in the Cassandra table.
I tried to Select the columns right after the options, but seems like there's no such method.
finalDF.select("rowkey","current_ts")
.withColumnRenamed("current_ts","timestamp")
.write
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "table_data", "keyspace" -> "ks_data"))
.mode("overwrite")
.option("confirm.truncate","true")
.save()
Say,
finalDF=
rowkey,current_ts
1,12233999
then Cassandra table should hold the value like After the update,
rowkey,message,number,timestamp,name
1,hello,12345,12233999,ABC
I'm using Dataframe API. So rdd approach cannot be used. How I can do this? Cassandra version 3.11.3, Datastax connector 2.4.0-2.11
Upvotes: 2
Views: 1943
Reputation: 29237
Clarification is SaveMode
is used to specify the expected behavior of saving a DataFrame to a data source.(not only for c* but for any datasource). Available options are
- SaveMode.ErrorIfExists
- SaveMode.Append
- SaveMode.Overwrite
- SaveMode.Ignore
In this case, Since you have already data and you want to append you have to use SaveMode.Append
import org.apache.spark.sql.SaveMode
finalDF.select("rowkey","current_ts")
.withColumnRenamed("current_ts","timestamp")
.write
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "table_data", "keyspace" -> "ks_data"))
.mode(SaveMode.Append)
.option("confirm.truncate","true")
.save()
see the spark docs here on SaveModes
Upvotes: 0