Julias
Julias

Reputation: 5892

How to setup a column order in spark cassandra connector

In spark I have a dataframe of some fixed order:

agg_id,agg_key,agg_val,req_num,clk_num

When I create similar table in cassandra the order of non key columns is not preserved:

CREATE TABLE mytable (
   agg_id int,
   agg_key int,
   agg_val text,
   req_num bigint,
   clk_num bigint,
 PRIMARY KEY ((agg_id,agg_key), agg_val )
) WITH CLUSTERING ORDER BY (agg_val asc)

So when I run desc mytable it shows me the the wrong order (first clk_num, and then req_num)

So when the following code is running, the data inserted in wrong order

ds.write
  .format("org.apache.spark.sql.cassandra")
  .options(Map(
    "keyspace" -> "online_aggregation",
    "table" -> cassOutTable) )
  .mode(SaveMode.Append)
  .save

My question is how can I set the columns names here? can I add some property to the options Map? or slightly change the code so it will work correctly. One limitation - no changes of the DF itself (it might be output to multiple sources)

Upvotes: 0

Views: 311

Answers (1)

user10744641
user10744641

Reputation: 26

Just select the columns in the required order before write

ds
  .select("agg_id", "agg_key", ..., "clk_num")
  .write
  .format("org.apache.spark.sql.cassandra")
  .options(Map(
    "keyspace" -> "online_aggregation",
    "table" -> cassOutTable) )
  .mode(SaveMode.Append)
  .save

Upvotes: 1

Related Questions