Kiran
Kiran

Reputation: 879

How to move columns from one keyspace to other in Cassandra

Can I move some columns(cc_payment, keyid) as mentioned below from a Cassandra keyspace billing to other Cassandra payments keyspace ? payment_info is going to be a new table.

Is there any way I can move? Or do I need to COPY TO csv file and import with COPY FROM options? Since the data is huge I am looking for options to directly move from one keyspace to other. We are using datastax cassandra.

Appreciate your help.

    FROM
========

keyspace:  billing
create table if not exists billing_info (
      user_id text,
      billing_id timeuuid,
      cc_payment frozen<cc_payment>,
      keyid text;
      PRIMARY KEY((user_id), billing_id)
) WITH CLUSTERING ORDER BY (billing_id DESC);

    TO
======
keyspace:  payments
create table if not exists payment_info (
      user_id text,
      payment_id timeuuid,
      cc_payment frozen<cc_payment>,
      keyid text;
      PRIMARY KEY((user_id), payment_id)
) WITH CLUSTERING ORDER BY (payment_id DESC);

Upvotes: 2

Views: 333

Answers (2)

Rahul Singh
Rahul Singh

Reputation: 104

Spark you can use this little snippet. You can do what you need to in updateColumns

val myKeyspace = "oldkeyspace" 
val myTable = "oldtable"
val newKeyspace = "newkeyspace" 
val newTable = "newtabl"

def updateColumns(row: CassandraRow): CassandraRow = { 
     val inputMap = row.toMap val newData = Map( "newColumn" -> "somevalue" ) 
     var outputMap = inputMap ++ newData CassandraRow.fromMap(outputMap) 
}

val result = sc.cassandraTable(myKeyspace, myTable) .map(updateColumns(_)) 
  .saveToCassandra(newKeyspace, newTable)

Upvotes: 1

Alex Ott
Alex Ott

Reputation: 87164

There are multiple ways to do this:

Copy files directly and then change table structure

Because tables differ only by one column name, it could be much faster to directly copy files, as following:

  • Create a table payments.payment_info with exactly same structure as billing.billing_info
  • stop writing to billing.billing_info

Then on every node of cluster, do following:

  • do flush for it: nodetool flush billing billing_info
  • change to the Cassandra's data directory
  • under the same user that is running Cassandra copy files billing/billing_info-<ID_of_the_table>/* into payments/payment_info-<ID_of_the_table>/
  • execute nodetool refreshpayments.payment_info`
  • check in cqlsh that data is available
  • perform renaming of the column with: ALTER TABLE payments.payment_info RENAME billing_id TO payment_id;

Migrate data by copying, using, for example, DSBulk or Spark.

If you're using DSE, then you can use DSBulk (it's better to take latest version) to unload data from one table and load into another. This command may work without creating an intermediate copy by writing data to standard output and reading it from standard input via Unix pipe, although in this case it will be slower because it can't achieve necessary parallelism.

In the simplest case it will be invoked as following, providing a mapping between changed field names (see documentation for details0:

dsbulk unload -k ks1 -t table1 -c json | dsbulk load -k ks2 -t table2 -c json -m "mapping_to_accomodate_changes_in_field_names"

But the task will be more complex if you'll need to copy not only data, but other things as well, such as, TTL and WriteTime - in this case you'll need to export it explicitly, and then load data in several batches, for each column separately.

Upvotes: 2

Related Questions