Reputation: 101
We have a question on whether Hadoop is suitable for simple tasks that require no application running, but require very fast reads and writes of small amount of data.
The requirement is to be able to write roughly a 100-200 bytes long messages with couple of indexes at rate 30 per second, at the same time to be able to read (search by those two indexes) at rate roughly 10 per seconds. The read queries must be very fast - 100-200 milliseconds max per query and return few matching records.
The total data volume is expected to reach 50-100 gb and is to be maintained at this rate by removing older records (something like daily task to delete records that are older than 14 days)
As you can see the total data volume is not really that big, but we are concerned that the search speed of Hadoop may be slower than our need anyway.
Is Hadoop a solution for this?
Thanks Nik
Upvotes: 0
Views: 100
Reputation: 25919
As Sam noted HBase is the Hadoop stack solution that can handle your requirements. However I wouldn't go with Hadoop if these are your only requirements from the data.
You can go with other NoSQL solutions like MongoDB or CouchDB or even MySQL or Postgres
Upvotes: 0
Reputation: 2969
Hadoop, alone, is very bad at serving out many small segments of data. However, HBase is an indexed table database-like system meant to be run on top of Hadoop. It is excellent at serving out small indexed files. I would research that as a solution.
Another problem to keep an eye on is that importing data into HDFS or HBase is not trivial. It can slow your cluster down quite a bit, so if Hadoop is your choice, you have to also solve how to get those 75GB into HDFS so Hadoop can touch them.
Upvotes: 2