Reputation: 879
Can I move some columns(cc_payment, keyid) as mentioned below from a Cassandra keyspace billing
to other Cassandra payments
keyspace ? payment_info
is going to be a new table.
Is there any way I can move? Or do I need to COPY TO csv file and import with COPY FROM options? Since the data is huge I am looking for options to directly move from one keyspace to other. We are using datastax cassandra.
Appreciate your help.
FROM
========
keyspace: billing
create table if not exists billing_info (
user_id text,
billing_id timeuuid,
cc_payment frozen<cc_payment>,
keyid text;
PRIMARY KEY((user_id), billing_id)
) WITH CLUSTERING ORDER BY (billing_id DESC);
TO
======
keyspace: payments
create table if not exists payment_info (
user_id text,
payment_id timeuuid,
cc_payment frozen<cc_payment>,
keyid text;
PRIMARY KEY((user_id), payment_id)
) WITH CLUSTERING ORDER BY (payment_id DESC);
Upvotes: 2
Views: 333
Reputation: 104
Spark you can use this little snippet. You can do what you need to in updateColumns
val myKeyspace = "oldkeyspace"
val myTable = "oldtable"
val newKeyspace = "newkeyspace"
val newTable = "newtabl"
def updateColumns(row: CassandraRow): CassandraRow = {
val inputMap = row.toMap val newData = Map( "newColumn" -> "somevalue" )
var outputMap = inputMap ++ newData CassandraRow.fromMap(outputMap)
}
val result = sc.cassandraTable(myKeyspace, myTable) .map(updateColumns(_))
.saveToCassandra(newKeyspace, newTable)
Upvotes: 1
Reputation: 87164
There are multiple ways to do this:
Copy files directly and then change table structure
Because tables differ only by one column name, it could be much faster to directly copy files, as following:
payments.payment_info
with exactly same structure as billing.billing_info
billing.billing_info
Then on every node of cluster, do following:
nodetool flush billing billing_info
billing/billing_info-<ID_of_the_table>/*
into payments/payment_info-<ID_of_the_table>/
nodetool refresh
payments.payment_info`ALTER TABLE payments.payment_info RENAME billing_id TO payment_id
;Migrate data by copying, using, for example, DSBulk or Spark.
If you're using DSE, then you can use DSBulk (it's better to take latest version) to unload data from one table and load into another. This command may work without creating an intermediate copy by writing data to standard output and reading it from standard input via Unix pipe, although in this case it will be slower because it can't achieve necessary parallelism.
In the simplest case it will be invoked as following, providing a mapping between changed field names (see documentation for details0:
dsbulk unload -k ks1 -t table1 -c json | dsbulk load -k ks2 -t table2 -c json -m "mapping_to_accomodate_changes_in_field_names"
But the task will be more complex if you'll need to copy not only data, but other things as well, such as, TTL and WriteTime - in this case you'll need to export it explicitly, and then load data in several batches, for each column separately.
Upvotes: 2