Reputation: 22042
My use case is as follows: I have to store and query data from multiple third-party sources. The only predefined schema knowledge I have is that it consists of Entity-Attribute-Value tuples with an additional Source component: (E, A, V, S)
. I do not know in advance which attributes will be present (so using the actual attributes themselves as columns is a problematic). The combination (E,A,V)
must be unique, and I therefore thought using that as the composite key would be the best way to model this, so we get this:
CREATE TABLE t1 (
E text,
A text,
V text,
S text,
PRIMARY KEY(E, A, V)
);
This will enable me to ask queries like:
The plan is to create mirrors of this table with differently ordered compound keys so that I can also answer queries of the form:
and so on and so forth (effectively, the mirror tables play the role of indexes, and to get full indexing I will need 6 copies of effectively the same data - not sure about the scalability of that approach yet, but that's a separate question I guess).
So far so good, however, the part I'm struggling with is this: V
is in fact an object in its own right, with multiple properties. If this were a relational model, I would have V
be a foreign key field pointing to a relation that maps its id
to, for example, a type
field and a value
field. But getting rid of foreign keys (and the joins that go with them) is more or less the point of the BigTable approach I guess, so I'm looking for a way to incorporate this into my table t1
.
Of course, I could do something like this:
CREATE TABLE t1 (
E text,
A text,
V_id text,
S text,
V_type text,
V_value text,
PRIMARY KEY(E, A, V_id)
);
But the problem I see is that this fails to capture the (inverse-)functional relation between V
's id, type and value: with the above table, I could get, for example:
E | A | V_id | V_type | V_value
---+----+------+--------+--------
a1 | b1 | 1 | X | foo
a1 | b1 | 2 | X | foo
a1 | b2 | 1 | Y | bar
While I want to be able to ensure that given a V_id
, type and value are unique, and vice versa. I guess what I'm after is what in the older Cassandra versions would be a nested super column, but I'm trying to achieve what I need with CQL3.
I briefly looked into collection types, but that doesn't quite seem to fit my use case.
Can anyone suggest a better way to model this, keeping in mind that ideally, I'd like to be able to retrieve (E, A, V)
and in as few queries as possible? Or am I just overthinking it and is the approach as I currently have it in fact fine (I can of course try and ensure uniqueness at the application level)?
Upvotes: 1
Views: 935
Reputation: 570
Jeen, consider using map for V. See http://www.datastax.com/docs/1.2/cql_cli/using/collections Map, list, set types were introduced to avoid situations like encoding/decoding key-value pairs into and from string/byte arrays.
Upvotes: 2
Reputation: 2595
You could denormalize t1 and create a new table where you could store attributes for a particular v_id. This way you could achieve the join equivalent of SQL. Assuming v_id is unique, your new table could have the following structure:
CREATE TABLE t2 (
V_id text PRIMARY KEY,
type text,
value text,
);
There is a very good read on mapping SQL operations to Cassandra : http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/enter link description here
Upvotes: 1