Reputation: 5075

Understanding Cassandra Data Model

I have recently started learning No-SQL and Cassandra through this article. The author explains the data model through this diagram:

The author also gives the below column family example:

Book {

 key: 9352130677{ name: “Hadoop The Definitive Guide”, author:” Tom White”, publisher:”Oreilly”, priceInr;650, category: “hadoop”, edition:4},

 key: 8177228137{ name”” Hadoop in Action”, author: “Chuck Lam”, publisher:”manning”, priceInr;590, category: “hadoop”},

 key: 8177228137{ name:” Cassandra: The Definitive Guide”, author: “Eben Hewitt”, publisher:” Oreilly”, priceInr:600, category: “cassandra”},

 }

But in that tutorial and every other tutorial I have gone through, then end up creating regular tables in cassandra. I am unable to connect the Cassandar model with what I am creating.

For example, I created a column family called Employee as below:

create columnfamily Employee(empid int primary key,empName text,age int);

Now I inserted some data and my column family looks as this:

For me this looks like a regular relational table and not like the data model the author has explained. How do I create a Employee column family where each row represents an employee with different attributes? Something like:

Employee{
101:{name:Emp1,age:20}
102:{name:Emp2,salary:1000}
102:{manager_name:Emp3,age:45}
}

}

Upvotes: 3

Answers (3)

Gunwant

Reputation: 979

What you have understood is correct. Just believe it. Internally cassandra stores columns exactly like the image in your question. Now, what you are expecting is to insert a column which is not defined while creating the Employee table. For dynamic columns, you can always use Map data types .

For example

create table Employee(
empid int primary key,
empName text,
age int,
attributes Map<text,text>);

To add new attributes you can use below queries.

UPDATE Employee SET attributes =  { manager_name : Emp3, age:45 } WHERE empid = 102;

Update -

another way to to create a dynamic column model is as below

        create table Employee(
    empid int primary key,
    empName text,
    attribute text,
    attributevalue text,
    primary key (empid,empName,attribute)
    );

Lets take few inserts -

insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','age','25') ;
insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','manager','emp2') ;
insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','department','hr') ;

this data structure will create a wide row, and behaves like dynamic column. you can see primary key empid and name is common for all three rows, only attribute and value will change.

Hope this will help

Upvotes: 1

nevsv

Reputation: 2466

You need to understand that in the representation using cql, is may look like regular relational table, but the internal structure of the rows in Cassandra is completely different. It is saving different set of attributes for each employee, and the nulls you can see while querying with cql is just a representation of empty/nonexistent cells.
What you trying to achieve, is unstructured data model. Cassandra started with this model, and all was working as described in the tutorial you've read, but there is an opinion that unstructured data design is unhealthy to development and makes more problems than it solves. So, after sometime, Cassandra moved to the "structured" data structure (and from thrift to cql). It doesn't mean that you have to store all attributes for all keys/rows, it doesn't mean that all the rows are have same number of attributes, it just means that you have to declare attributes before you use them.
You can achieve some kind of unstructured data modeling using Map, List, Set, etc. data types, UDT (User defined types) or just saving your data as json string and parsing it on the application side.

Upvotes: 5

itstata

Reputation: 1168

Cassandra uses a special primary key called compositie key. This is the representation of the partitions. This is also one reason why cassandra scales well. The composite key is used to determine the nodes on which the rows are stored.

The result in your console may be a result set of rows, but the intern organization of cassandra is differnt from that. Have you ever tried to query a table without an primary key? You will quickly see that you can't query that flexible (because of the partitioning).

After that you will understand why we have to use a query-first design aproach for cassandra. This is completely different from RDBBS.

Upvotes: -2

Understanding Cassandra Data Model

Answers (3)

Related Questions