Reputation: 11
I'm working with Cassandra (Datastax version) and I have an issue. I want to model a column which is (always) going to change. I can't just create a column family with 1,2,3,4..10 columns because tomorrow probably can change.
I think in collections, but I got to query into these. I need query into this information every second.
Ex: With map:
<'col1':'val1','col2':'val2'>
I need to query like this:
SELECT * FROM example WHERE 'col1' = 'val1' AND 'col2' = 'val2';
I don't know how to do this and this is extremely necessary for what I want to do.
I read that you can create a column (text) and implement a kind of format:
colum1 = 'val1\x01val2\x01'
But this doesn't resolve what I want to do, because I can't query in these fields (or don't know how)
How can I model something like that?
I can't use a collection because (according to what I read) it is slow.
Upvotes: 0
Views: 332
Reputation: 8812
You can create a table like this
CREATE TABLE dynamic_columns
partitionKey bigint,
column_name text,
column_value_text text,
column_value_boolean boolean,
column_value_bigint bigint,
column_value_uuid uuid,
column_value_timestamp timestamp,
....
PRIMARY KEY((partitionKey), column_name)
);
The partitionKey is here to indicate on which machine(s) your data will be stored in the cluster
The clustering column column_name
will store the label of your dynamic column. Then we have a list of normal columns, one for each data type (bigint, uuid, timestamp ....)
Let's take and example:
INSERT INTO dynamic_columns(partitionKey, column_name, column_value_text)
VALUES(1, 'firstname', 'John DOE');
INSERT INTO dynamic_columns(partitionKey, column_name, column_value_boolean)
VALUES(1, 'validity_state', true);
INSERT INTO dynamic_columns(partitionKey, column_name, column_value_timestamp)
VALUES(1, 'validity_date', '2016-03-13 12:00:00+0000');
So the idea is that we define a list of column_value, one for each existing type in Cassandra but we only insert data into the appropriate type column, like the examples above.
For querying, you'll need to create an index on each type column. Example:
CREATE INDEX ON dynamic_columns(column_value_boolean);
CREATE INDEX ON dynamic_columns(column_value_text);
CREATE INDEX ON dynamic_columns(column_value_boolean);
....
If you can switch to Cassandra 3.4, there is a better secondary index implementation called SASI, here the syntax for creating index:
// All data types EXCEPT text
CREATE CUSTOM INDEX ON types(column_value_boolean)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'mode': 'SPARSE'};
// Text data type
CREATE CUSTOM INDEX ON types(column_value_text)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'PREFIX',
'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};
Then you can query your columns easily:
//Give me col1 where value = 'val1'
SELECT * FROM dynamic_columns
WHERE partitionKey=1
AND column_name='col1'
AND column_value_text='val1';
//Give me 'validity_state' = true
SELECT * FROM dynamic_columns
WHERE partitionKey=1
AND column_name='validity_state'
AND column_value_boolean=true;
Remark: you should always provide the partitionKey value in your SELECT otherwise Cassandra will perform a full cluster scan in worst case and kill your performance. With the SASI index since Cassandra 3.4, this problem is less critical but it is still strongly recommended to provide partitionKey when using secondary index
For more information on the importance of partition key, read this: http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/
Upvotes: 0