Reputation: 2634
Continuing the question at Insert Spark Dataset[(String, Map[String, String])] to Cassandra Table.
I have a Spark Dataset of type Dataset[(String, Map[String, String])].
I have to insert the same into a Cassandra table.
Here, key in the Dataset[(String, Map[String, String])] will become my primary key of the row in Cassandra.
The Map in the Dataset[(String, Map[String, String])] will go in the same row in a column ColumnNameValueMap.
My Cassandra table structure is:
CREATE TABLE SampleKeyspace.CassandraTable (
RowKey text PRIMARY KEY,
ColumnNameValueMap map<text,text>
);
I was able to insert the data in Cassandra table using the Spark Cassandra connector.
Now, I am updating the same map column (2nd column) with new key values for the same rowkey (1st column/primary key). But, every new update to this column purges the previous map.
How can I append the same map using Spark Cassandra connector?
Upvotes: 1
Views: 782
Reputation: 87174
I don't think that it's possible to do it directly from Dataframe API, but it's possible to do via RDD API. For example, I have following tabble with some test data:
CREATE TABLE test.m1 (
id int PRIMARY KEY,
m map<int, text>
);
cqlsh> select * from test.m1; id | m ----+-------------------- 1 | {1: 't1', 2: 't2'} (1 rows)
and I have data in Spark:
scala> val data = Seq((1, Map(3 -> "t3"))).toDF("id", "m") data: org.apache.spark.sql.DataFrame = [id: int, m: map<int,string>]
then I can specify that I want to append data to specific column wit following code:
data.rdd.saveToCassandra("test", "m1", SomeColumns("id", "m" append))
and I can see that data is updated:
cqlsh> select * from test.m1;
id | m
----+-----------------------------
1 | {1: 't1', 2: 't2', 3: 't3'}
(1 rows)
Besides append
, there is support for removing elements with remove
option, and prepend
(only for lists). Documentation contains examples on that.
Upvotes: 1