Dataset[Seq[(String, String, String)]] to Dataset[(String, String, String)]

Question

I am having a Cassandra table with the following structure:

CREATE TABLE myKeyspace.myTable (
  rowkey text,
  columnname text,
  columnvalue text,
  PRIMARY KEY (rowkey, columnname)
  )

I wish to insert data in the same with Spark Cassandra connector.

My Spark Dataset is of type Dataset[Seq[(String, String, String)]].

I want to convert it to Dataset[(String, String, String)] so that it can be inserted in the table using .rdd.saveToCassandra API.

Please assist on the conversion or is there a direct way to use the same Dataset[Seq[(String, String, String)]].

s.polam · Accepted Answer

Call flatMap on Dataset[Seq[(String, String, String)]], Check below & Please let me know if not working.

scala> dds
res124: org.apache.spark.sql.Dataset[Seq[(String, String, String)]] = [value: array>]

scala> dds.printSchema
root
 |-- value: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _1: string (nullable = true)
 |    |    |-- _2: string (nullable = true)
 |    |    |-- _3: string (nullable = true)


scala> dds.flatMap(d => d)
res126: org.apache.spark.sql.Dataset[(String, String, String)] = [_1: string, _2: string ... 1 more field]

Dataset[Seq[(String, String, String)]] to Dataset[(String, String, String)]

Answers (1)

Related Questions