Reputation: 2634
I am having a Cassandra table with the following structure:
CREATE TABLE myKeyspace.myTable (
rowkey text,
columnname text,
columnvalue text,
PRIMARY KEY (rowkey, columnname)
)
I wish to insert data in the same with Spark Cassandra connector.
My Spark Dataset is of type Dataset[Seq[(String, String, String)]]
.
I want to convert it to Dataset[(String, String, String)]
so that it can be inserted in the table using .rdd.saveToCassandra
API.
Please assist on the conversion or is there a direct way to use the same Dataset[Seq[(String, String, String)]]
.
Upvotes: 0
Views: 214
Reputation: 10382
Call flatMap
on Dataset[Seq[(String, String, String)]]
, Check below & Please let me know if not working.
scala> dds
res124: org.apache.spark.sql.Dataset[Seq[(String, String, String)]] = [value: array<struct<_1:string,_2:string,_3:string>>]
scala> dds.printSchema
root
|-- value: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: string (nullable = true)
| | |-- _2: string (nullable = true)
| | |-- _3: string (nullable = true)
scala> dds.flatMap(d => d)
res126: org.apache.spark.sql.Dataset[(String, String, String)] = [_1: string, _2: string ... 1 more field]
Upvotes: 4