Reputation: 3689
I've got a website written in Java using Spring Framework. I've got 10 batch jobs which will run concurrently and their job is crawl selected websites, processed them and indexes them in SOLR. SOLR, Client Application and Database will be hosted on Amazon AWS host.
I want to know if it's a good idea to host heavy bandwidth dependant (downloads web pages) batch jobs on web host(Amazon AWS) ? Or should I run them on my local computer as it will be easier to monitor them if they fail.
If I run the jobs locally, I will have to copy one table (URLS_SUBMITTED) from Client Database on the host on regular basis for the batch jobs to process the URL's. I will also need to establish a secure HTTPS connection with SOLR to update the documents.
If I host it on a web host then I will only need one database but the jobs will be harder to maintain.
From experience which method do you recommend ?
Upvotes: 0
Views: 508
Reputation: 105083
Amazon Elastic MapReduce is what you need for this task. With EMR you will treat your "batch jobs" as just "jobs" which are parallelized and executed in "cloud".
Upvotes: 0
Reputation: 827
Do it in on AWS.
They almost certainly have better network connectivity than you do, the bandwidth cost is probably trivial in the scheme of things, and you get the advantage of having everything hosted and managed in one place.
It should be just as easy (or easier) to monitor the servers in the cloud.
I'm intrigued by your comment about being the jobs being "harder to maintain" on the web. Feel free to add some comments explaining this further.
Upvotes: 1