Reputation: 655
There's a lot of confusion among these terms. I'd like to throw my understanding out and see if people agree. I have seen conflicting and wrong definitions all over the web.
In my mind, wide column and column family DBs are essentially the same thing. They are
The main difference is they don't have fixed schema for columns and can't do table join obviously.
An example of 3 rows (column families): each row has different length and/or columns, but on disk rowkey1's entire content is a continuous line followed by other rows similar to relational DB
rowkey1 k1-v k2-v k3-v
rowkey2 k1-v k4-v
rowkey3 k2-v k4-v k5-v
On the other hand, the term columnar DB is the same as column-oriented DB. They are stored on disk one column at a time, not one row at a time. It is great for time series or any multi series analytical purpose. The fact each column has the same type of data and is stored together allows for better data compression as an added bonus.
an example:
on disk:
a:1 b:2 c:3 d:4
10:1 9:2 8:3 7:4
Upvotes: 19
Views: 5692
Reputation: 1021
The definition from Wikipedia also helps further:
Wide-column stores such as Bigtable and Apache Cassandra are not column stores in the original sense of the term, since their two-level structures do not use a columnar data layout. In genuine column stores, a columnar data layout is adopted such that each column is stored separately on disk. Wide-column stores do often support the notion of column families that are stored separately. However, each such column family typically contains multiple columns that are used together, similar to traditional relational database tables. Within a given column family, all data is stored in a row-by-row fashion, such that the columns for a given row are stored together, rather than each column being stored separately. Wide-column stores that support column families are also known as column family databases.
Reference: https://en.wikipedia.org/wiki/Wide-column_store
Upvotes: 8