Reputation: 300
I want to make a system monitoring application for my server which collects over 30k data points every minute for various applications like mysql, memcached, apache etc. I want to know which platform would be most helpful to use for such an application. My major options are HBase and Cassandra.
If I have to use HBase what should be my row key for a query which will have to answer questions like for a particular ip or hostname or all machines which run the particular application or particular data center or particular cluster. Given all the parameters are variant in a considerable period of time and only way to identify a particular machine is its UUID. Also it is not necessary to query based on uuid but its ip or application type and application and process.
Since it is not easy to query roll up and drill down queries in hbase is it easy in cassandra. What should be my preferences in designing such a system. What other platforms can be chosen?
Please also specify what should be the design specifications and data schema for such a system
Upvotes: 0
Views: 231
Reputation: 4542
I think Splunk is exactly what you are looking for. They are specialized in collecting and analyzing log files with Big Data technologies. They also offer a free version, which is of course limited.
If you want to go with open source software, I recommend to split your task into two parts: a) Storage, b) Querying/Analytics. The advantage of the "split" approach is that you can choose a suitable analytics system afterwards.
For a) I suggest to use a HDFS and a log file collector such as Flume or Chuckwa. You can also do some pre-filtering with these systems.
For b) have a look at systems such as Hive, Drill or Spark. I'm not sure if HBase is the best idea since you are limiting the scope of your analysis from early on.
Upvotes: 0