Reputation: 5892
In spark I have a dataframe of some fixed order:
agg_id,agg_key,agg_val,req_num,clk_num
When I create similar table in cassandra the order of non key columns is not preserved:
CREATE TABLE mytable (
agg_id int,
agg_key int,
agg_val text,
req_num bigint,
clk_num bigint,
PRIMARY KEY ((agg_id,agg_key), agg_val )
) WITH CLUSTERING ORDER BY (agg_val asc)
So when I run desc mytable it shows me the the wrong order (first clk_num, and then req_num)
So when the following code is running, the data inserted in wrong order
ds.write
.format("org.apache.spark.sql.cassandra")
.options(Map(
"keyspace" -> "online_aggregation",
"table" -> cassOutTable) )
.mode(SaveMode.Append)
.save
My question is how can I set the columns names here? can I add some property to the options Map? or slightly change the code so it will work correctly. One limitation - no changes of the DF itself (it might be output to multiple sources)
Upvotes: 0
Views: 311
Reputation: 26
Just select the columns in the required order before write
ds
.select("agg_id", "agg_key", ..., "clk_num")
.write
.format("org.apache.spark.sql.cassandra")
.options(Map(
"keyspace" -> "online_aggregation",
"table" -> cassOutTable) )
.mode(SaveMode.Append)
.save
Upvotes: 1