Reputation: 713
Using SQL databases, it is easy to do statistical / aggregate functions like covariance, standard deviation, kurtosis, skewness, deviations, means and medians, summation and product etc, without taking the data out to an application server. http://www.xarg.org/2012/07/statistical-functions-in-mysql/
How are such computations done effectively (as close as possible to the store, assuming map/reduce "jobs" won't be realtime) on NoSql databases in general and dynamodb(cassandra) in particular, for large datasets.
AWS RDS (MySQL, PostgresSQL, ...) is, well, not NoSQL and Amazon Redshift (ParAccel) - a column store - has a SQL interface and may be an overkill ($6.85/hr). Redshift has limited aggregation functionality (http://docs.aws.amazon.com/redshift/latest/dg/c_Aggregate_Functions.html, http://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html)
Upvotes: 1
Views: 1357
Reputation: 10052
MongoDB has some aggregation capabilities that might fit your needs http://docs.mongodb.org/manual/aggregation/
Upvotes: 1
Reputation: 1769
For DB's which have no aggregate functionality (e.g. Cassandra) you are always going to have to pull some data out. Building distributed computation clusters close to your DB is a popular option at the moment (using projects such as Storm). This way you can request and process data in parallel to do your operations. Think of it as a "real time" Hadoop (though it isn't the same).
Implementing such a setup is obviously more complicated than having a system that supports it out of the box, so factor that into your decision. The upside is that, if needed, a cluster allows you to do perform complex custom analysis way beyond anything that will be supported in a traditional DB solution.
Upvotes: 2
Reputation: 29985
Well, in MongoDB you have a possibility to create a some kind of UDF:
db.system.js.save( { _id : "Variance" ,
value : function(key,values)
{
var squared_Diff = 0;
var mean = Avg(key,values);
for(var i = 0; i < values.length; i++)
{
var deviation = values[i] - mean;
squared_Diff += deviation * deviation;
}
var variance = squared_Diff/(values.length);
return variance;
}});
db.system.js.save( { _id : "Standard_Deviation"
, value : function(key,values)
{
var variance = Variance(key,values);
return Math.sqrt(variance);
}});
The description is here.
Upvotes: 1