Reputation: 652
I'm looking for a cloud computing solution for the following scenario, but I don't find any service among Amazon AWS and the like that matches my problem description. Do you know any cloud computing platform for my problem?
The general problem: I want to run some data analysis on a data stream (only about 1k per second). Data analysis is carried out by a bunch of independent threads that operate on that data stream. Each thread simply computes a Boolean value. The more threads I have the better is the computed result.
My current solution: I've scrounged a box with an Intel Core i7 from another department, but now they want it back :-).
The ideal solution: Some service that provides me with an abstract machine (like a JVM with unlimited resources) on which I can spawn a great number of threads. Also there needs to be some kind of connection to stream the input data and get back the computed results (< 1k per second). Things should happen in real time (in contrast to being scheduled to be executed like "in the next few minutes").
So the bottleneck is not memory or disk space, but just computing power and latency. (And since I need the data analysis just every now and then, cloud computing seems to be economically reasonable here.)
Upvotes: 0
Views: 585
Reputation: 101
Interestingly enough I was just writing a post on Making Hadoop Run Faster in which i pointed to stream base processing as away to speed up the processing time of feeds as the comes in rather than processnig them in batch. The solution uses an opensource project named Cloudify.
Cloudify allows me to spawn this entire environment on Amazon or any other cloud through a single command and also auto-scale the processing as the load grows.
A demo environment with the source code and a step by step guide is available here
It sounds to me that this may address your needs - let me know if this isn't the case and i'll dig-in further to see if i can come-up with other solutions.
Upvotes: 2
Reputation: 4060
For completeness from the major vendors you have a few categories of choices:
Cloud compute which scales, from AWS it's EC2; from Google it's Google Compute Engine (still in private beta); from Microsoft it's Azure Virtual Machines (also still in private beta). There are, of course, many other vendors, such as Rackspace (which uses OpenStack and more). Given your scenario, I believe something in this category would be the best choice for you.
Cloud-based MapReduce (running on Hadoop) - from AWS that's Elastic MapReduce; from Google that's BigQuery; from Microsoft that's Hadoop on Azure (which is still in beta). There are other vendors in this space as well...Cloudera, HortonWorks, etc... here's a list.
Upvotes: 1
Reputation: 15143
I noticed you tagged google-app-engine. Probably not what you're looking for, it's more for web services. Google's relatively new Compute Engine matches your description though.
http://cloud.google.com/products/compute-engine.html
Upvotes: 1
Reputation: 3639
For your case, I will highly recommend Amazon Elastic MapReduce. You can refer to this document for details :- Amazon EMR
It might be a little struggle initially , if you are new to AWS, but it will be great once you know how it works.
Upvotes: 1