chs
chs

Reputation: 652

Cloud platform for real-time computing-intensive task?

I'm looking for a cloud computing solution for the following scenario, but I don't find any service among Amazon AWS and the like that matches my problem description. Do you know any cloud computing platform for my problem?

The general problem: I want to run some data analysis on a data stream (only about 1k per second). Data analysis is carried out by a bunch of independent threads that operate on that data stream. Each thread simply computes a Boolean value. The more threads I have the better is the computed result.

My current solution: I've scrounged a box with an Intel Core i7 from another department, but now they want it back :-).

The ideal solution: Some service that provides me with an abstract machine (like a JVM with unlimited resources) on which I can spawn a great number of threads. Also there needs to be some kind of connection to stream the input data and get back the computed results (< 1k per second). Things should happen in real time (in contrast to being scheduled to be executed like "in the next few minutes").

So the bottleneck is not memory or disk space, but just computing power and latency. (And since I need the data analysis just every now and then, cloud computing seems to be economically reasonable here.)

Upvotes: 0

Views: 585

Answers (4)

Nati Shalom
Nati Shalom

Reputation: 101

Interestingly enough I was just writing a post on Making Hadoop Run Faster in which i pointed to stream base processing as away to speed up the processing time of feeds as the comes in rather than processnig them in batch. The solution uses an opensource project named Cloudify.

Cloudify allows me to spawn this entire environment on Amazon or any other cloud through a single command and also auto-scale the processing as the load grows.

A demo environment with the source code and a step by step guide is available here

It sounds to me that this may address your needs - let me know if this isn't the case and i'll dig-in further to see if i can come-up with other solutions.

Upvotes: 2

Lynn Langit
Lynn Langit

Reputation: 4060

For completeness from the major vendors you have a few categories of choices:

  1. Cloud compute which scales, from AWS it's EC2; from Google it's Google Compute Engine (still in private beta); from Microsoft it's Azure Virtual Machines (also still in private beta). There are, of course, many other vendors, such as Rackspace (which uses OpenStack and more). Given your scenario, I believe something in this category would be the best choice for you.

  2. Cloud-based MapReduce (running on Hadoop) - from AWS that's Elastic MapReduce; from Google that's BigQuery; from Microsoft that's Hadoop on Azure (which is still in beta). There are other vendors in this space as well...Cloudera, HortonWorks, etc... here's a list.

  3. Cloud-based Database (either RDBMS or NoSQL) - there are many choices here. Because you describe your scenario as 'compute intensive' I am thinking this may not be needed. However depending on the amount & frequency of up/down traffic, if your scenario allow for batching, then you may elect to upload, process and store in the cloud and then pull down via a schedule. From AWS, there are many ways to host a RDBMS - RDS or EC2 are the usual choices; For Google, you can access MySQL via Google Cloud SQL; For Microsoft, your choice is SQL Azure or SQL Server on an Azure VM (latter still in beta). For cloud-hosted NoSQL, you have AWS DynamoDB; from Google you have Google Cloud Storage or the High Replication store (the latter requires you to use GAE); from Microsoft you have Azure storage (tables, blobs and queues).

Upvotes: 1

dragonx
dragonx

Reputation: 15143

I noticed you tagged google-app-engine. Probably not what you're looking for, it's more for web services. Google's relatively new Compute Engine matches your description though.

http://cloud.google.com/products/compute-engine.html

Upvotes: 1

Avichal Badaya
Avichal Badaya

Reputation: 3639

For your case, I will highly recommend Amazon Elastic MapReduce. You can refer to this document for details :- Amazon EMR

It might be a little struggle initially , if you are new to AWS, but it will be great once you know how it works.

Upvotes: 1

Related Questions