Roy Reznik
Roy Reznik

Reputation: 2110

NoSQL or RDBMS for audit data

I know that similar questions were asked in the subject, but I still haven't seen anyone that completely contained all my requests.

I would start by saying that I only have experience in RDBMS's so I'm sorry if I get anything regarding NoSQL wrong.

I'm creating a database that would hold a large amount of audit logs (about 1TB).

I'm using it for:

  1. Fast data writing (a massive amount of audit logs is written all the time)

  2. Search - search over the audit data (search actions performed by a certain user, at a certain time or a certain action... the database should support searching any of the 'columns' very quickly)

  3. Analytics & Reporting - Generate daily, weekly, monthly reports of the data (They are predefined at the moment.. if they are more dynamic, does it affect the solution I should choose?)

Reliability (support for fail-over or any similar feature), Scalability (If I grow above 1TB to 2TB, 10TB or 100TB - does any of the solutions can't support this amount of data?) and of course Performance (in the use cases I specified) are very important to me.

I know RDBMS and that would be my easy way of starting, but I'm really concerned that after a while, the DB would simply not keep up with the pace.

My question is should I pick an RDBMS or NoSQL solution and why? If a NoSQL solution, since they are so different, which of them do you think fits my needs?

Upvotes: 8

Views: 5998

Answers (1)

rlb
rlb

Reputation: 1714

Generally there isn't a right or wrong answer here.

Fast data writing, either solution will be ok, although you didn't say what volume per second you are storing. Both solutions have things to watch out for.

Search (very quick) over all columns. For smaller volumes, say few hundred Gb, then either solution will be Ok (assuming skilled people put it together). You didn't actually say how fast/often you search, so if it is many times per minute this consideration becomes more important. Fast search can often slow down ability to write high volumes quickly as indexes required for search need to be updated.

Audit records typically have a time component, so searching that is time constrained, eg within last 7 days, will significantly speed up search times compared to search all records.

Reporting. When you get up to 100Tb, you are going to need some real tricks, or a big budget, to get fast reporting. For static reporting, you will probably end up creating one program that generates multiple reports at once to save I/O. Dynamic reports will be the tricky one.

My opinion? Since you know RDBMS, I would start with that as a method and ship the solution. This buys you time to learn the real problems you will encounter (the no premature optimization that many on SO are keen on). During this initial timeframe you can start to select nosql solutions and learn them. I am assuming here that you want to run your own hardware/database, if you want to use cloud type solutions, then go to them straight away.

Upvotes: 10

Related Questions