Reputation: 153
I have two column families in Cassandra with the same structure:
CREATE TABLE keyspace.table (
id bigint,
firstname text,
lastname text,
countryId bigint,
cityId bigint,
solr_query text,
PRIMARY KEY (id)
) ;
In the second table the number of rows is three times bigger than in the first table. For example in the first table I would have the row (firstname, lastname, countryId):
John, Doe, 100, 8
and in the second table I would have the rows (firstname, lastname, countryId):
John, Doe, 222, 1
John, Doe, 100, 8
John, Doe, 333, 9
I need to create only one column family using collections so that I would have only one line with John Doe with the primary countryId (100), and a collection of the other ones. I added a user-defined data_type:
CREATE TYPE hints_info (
countryId bigint,
cityId bigint
);
and added a collection to my first column family:
ALTER TABLE table1 ADD hints map<int, frozen <hints_info>>;
So for the example above my row would look like:
John, Doe, 100, "{1: {countryId :222, cityId :1}, 2: {countryId :333, cityId :9}}"
Can I do this from Cassandra? using CQL? To take each row from the first column family, match it with a row from the second column family and add the missing info that don't exist in the first column family, in a collection? All my data is imported in Cassandra. If I would use C# for example, what would be the syntax to update my collection from my existing column family, with the data from my second column family?
Thank you!
Upvotes: 2
Views: 231
Reputation: 3784
Cassandra does not support such thing where you read from one table and insert in another. CQL is suited for simple read/write operations and does not have complex query support as in relational world.
I would advise to make third table with desired structure and copy data from 2 original tables slowly until you are satisfied. Reason for this is that you can leave you application to work with existing table structure, migrate in background and you have data in original tables if something goes wrong. After you are satisfied you just change your application logic to read from new table.
As for migration you have couple of options:
Option 2. has much overhead just for this use case since you must install and understand whole new tool (Spark) so that was reason why we created migration tool for C#, Java and Scala for this kind of small ETL tasks. Option 1. is easiest and if you have infrastructure in your code to do background jobs go for it.
Upvotes: 0