Reputation: 1102
A few days ago I read about wide-column stored type of NoSQL and exclusively Apache-Cassandra.
What I understand is that Cassandra consist of:
A keyspace(like database in relational databases) and supporting many column families or tables (Same as table in relational databases) and unlimited rows.
From Stackoverflow tags:
A wide column store is a type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.
In Cassandra all of the rows (in a table) should have a row key then each row key can have multiple columns. I read about differences in implementation and storing data of Relational database and NoSQL (Cassandra).
But I don't understand the difference between structure:
Imagine a scenario which I have a table (or column family in Cassandra):
When I execute a query (CQL) like this :
select * from users;
It gives me the result as you can see :
lastname | age | city | email
----------+------+---------------+----------------------
Doe | 36 | Beverly Hills | [email protected]
Jones | 35 | Austin | [email protected]
Byrne | 24 | San Diego | [email protected]
Smith | 46 | Sacramento | null
Jones2 | null | Austin | [email protected]
So I perform the above scenario in relational database (MS SQL) with the following query:
select * from [users]
And the result is:
lastname | age | city | email
----------+------+---------------+----------------------
Doe | 36 | Beverly Hills | [email protected]
Jones | 35 | Austin | [email protected]
Byrne | 24 | San Diego | [email protected]
Smith | 46 | Sacramento | NULL
Jones2 | NULL | Austin | [email protected]
I know that Cassandra supports dynamic column and I can perform this by using sth like:
ALTER TABLE users ADD website varchar;
But it is available in relational model for example in mssql the above code can be implemented too. Something like:
ALTER TABLE users ADD website varchar(MAX);
What I see is that the first select and second select result is the same.
In Cassandra , they just give a row key (lastname) as a standalone object but it is same as a unique field (like ID or a text) in mssql (and all relational databases) and I see the type of column in Cassandra is static (in my example varchar
) unlike what it describes in Stackoverflow tag.
So my questions is:
Is there any misunderstanding in my imagination about Cassandra?!
So what is different between two structure ?! I show you the result is same.
Is there any special scenarios (JSON like) that cannot be implemented in relational databases but Cassandra supports? (For example I know that nested column doesn't support in Cassandra.)
Thank you for reading.
Upvotes: 24
Views: 16207
Reputation: 864
In my experience CQL misleads a lot of people. First of all you would never want to do:
SELECT * FROM a_table_here;
On a production Cassandra cluster, since you are putting a huge load on your Coordinator node to aggregate all of the data from all of the other nodes. Also by default, you will be given back a maximum of 10000 "rows".
To understand how Cassandra stores your data, we need to establish a few terms first:
There's the Primary Key, in your case lastname
, this is hashed to determine which node in the cluster owns this range, and it's stored there (plus any replica nodes).
Next there's Cluster Columns, I don't know if you have any in your example, but you define them like PRIMARY KEY ((lastname),age, city)
. In that example you are clustering by age first then city, this is ORDERED.
Now for a simplistic high-level view of Cassandra for your use case, it stores the data as a Map to an ordered Multimap:
Doe -> 36:Beverly Hills -> [email protected]
Where 'Doe' is the Primary Key, which tells you which node(s) have that row of data. And 36:Beverly Hills
is the Ordered Clustering Keys (part of the ordered multimap key). Lastly [email protected] is the final value (can be multiple mind you) for the Map to a Multimap.
There's a lot of nuisances that I left out to make the example simple, for a more in-depth I would highly suggest reading: http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/
Upvotes: 6
Reputation: 2014
We have to look at more complex example to see the differences :)
For start:
Table is defined as "two-dimensional view of a multi-dimensional column family".
The term "wide-rows" was related mainly to the Thrift API. In cql it is defined a bit differently, but underneath looks the same.
Comparing SQL and CQL. In SQL table is a set of rows. In simple example it looks like in CQL it is the same, but it is not. CQL table is a set of partitions, where each partition can be just a single row (e.g. when you don't have a clustering key) or multiple rows. Partition containing multiple rows is in Thrift therminology named "wide-row". To see how it is stored underneath, please read e.g. part about composite-keys from here.
There are more differences:
address
as type, and reuse this type in many places), or collection
can be a collection of user defined typesI hope I was able to make it a bit more clear for you. I recommend watching some vidoes (or reading slides) from Datastax Core Concepts Course as solid introduction to Cassandra.
Upvotes: 15