What is the best database/storage to store statistic data?

Question

I'm having a system that collects real-time Apache log data from about 90-100 Web Servers. I had also defined some url patterns.

Now I want to build another system that updates the time of occurrence of each pattern based on those logs.

I had thought about using MySQL to store statistic data, update them by statement: "Update table set count=count+1 where ....",

but i'm afraid that MySQL will be slow for data from such amount of servers. Moreover, I'm looking for some database/storage solutions that more scalable and simple. (As a RDBMS, MySQL supports too much things that I don't need in this situation) . Do you have any idea ?

Niels van der Rest · Accepted Answer

Apache Cassandra is a high-performance column-family store and can scale extremely well. The learning curve is a bit steep, but will have no problem handling large amounts of data.

A more simple solution would be a key-value store, like Redis. It's easier to understand than Cassandra. Redis only seems to support master-slave replication as a way to scale, so the write performance of your master server could be a bottleneck. Riak has a decentralized architecture without any central nodes. It has no single point of failure nor any bottlenecks, so it's easier to scale out.

What is the best database/storage to store statistic data?

Answers (2)

Related Questions