eddie-ryan
eddie-ryan

Reputation: 285

nosql: MongoDB, Cassandra or alternative for data warehousing

I am stuck between a concrete decision on whether to go with MongoDB or Cassandra for my database needs and would like input on my use case as to guide my decision.

Requirements:

Data source

e.g. Currently ( 3 datacenters, 50 total servers, 19 networks and 10 stats ). These numbers will increase over time.

Data fetching:

Data storage:

Note: We need the ability to:

Example use case: On the front-end you will query like so, select; date window, period report, specific datacenter, specific/all networks, specific/all statistics and whether results are totalled or individual across the servers.

Example #1

 - From: August 16th 2012 -> April 16th 2013
 - Period: Daily
 - Data-center: EU A
 - Stat-type: Error
 - Servers: All

From reading similar articles across stack-overflow and the web, I've come to the conclusion that my best bet may be MongoDB for its flexible queries and closeness to a relational database. Cassandra seems like an option if my writes were of higher volumes - although I do like the column based model. I am a novice to database design and management so ease of use is also a factor (still a CS student).

From my use cases which NoSql database is the best option?

Upvotes: 3

Views: 3960

Answers (2)

TomFH
TomFH

Reputation: 65

Your topic says, "nosql: MongoDB, Cassandra or alternative for data warehousing." Your description however is not exactly data warehousing. If the question is: on what to do a proper "data warehouse," then the answer is none of these NOSQL data stores. The best data warehouse solution is a parallel database (MPP) in a shared nothing environment. For query/statistical reporting needs, an inverted column database like Sybase IQ or Vertica. Either of these (MPP or inverted column) will clean the clock of NOSQL in a true data warehouse environment.

Upvotes: 1

LMeyer
LMeyer

Reputation: 2631

You pretty much nailed it in your conclusion. To make up your mind you mainly have to chose between the perks of each DB, that is :

Cassandra :

  • Better availability (master/master so no SPOF)
  • Better scalability : (Linear, elastic)
  • Better writes performance

MongoDB :

  • Better queries (API and native full text search)
  • Ease of use (variety of API, XML/JSON...)

Consistence isn't much of an issue I guess and anyway they're both eventually consistent. Even if MongoDB is probably easier to get started with (closer to relationnal data model), Cassandra isn't that hard either, you just have to understand the column oriented paradigm. Anyway from a technical point of view, I guess the answer depends on how you expect your system to grow in size and if your requests will evolve or not.

Upvotes: 5

Related Questions