Baptiste Merliot
Baptiste Merliot

Reputation: 861

Run long java processes on demand with AWS

I want to process a large csv (millions of lines) with a Java application on AWS, and write the results in another csv.

The application is packaged in a single jar and can be run with some shell command java -jar myJar.jar -option1 -option2.

The application could be called anytime depending on a user uploading a csv, triggering the application.

Problem : It works for small files, but lambda functions are limited in execution time, RAM, CPU and temp file storage. They actually are made for short processes.

Problem : Having a cluster running, even when idle, means paying for it.

Is there a solution to run this jar without having coding its equivalent in a custom AWS technology?

EDIT : To answer the comments

Upvotes: 1

Views: 491

Answers (1)

qkhanhpro
qkhanhpro

Reputation: 5220

There are multiple part where you can make it work more efficiently and saves money.

Require coding:

  • If it's not absolutely necessary to process the whole 1M lines at once/together , try to break it to smaller pieces.
  • Write an Lambda that react to CSV creation, the Lambda spawn EC2s on your behalf and send the jobs to EC2 to process ( Quite the configuration works needed, I believe )

Less coding required:

  • You wouldn't need to have the whole EC2 cluster running, just one small instance and scale up as the workload raise, same with the solution below
  • Go for Elastic Beanstalk. They do the auto scaling for you, you just upload the .jar

Note that the biggest Lambda is quite powerful, at the moment, 3000MB RAM with equivalent CPU power and it gives you 15 minutes to do a task. To keep one T2.Medium ( 4Gb RAM, 2vCore) running 24/7 a month would cost you ~ $38

Or Both:

  • You can keep a sleeping/stopped EC2 instance that would cost you a fraction of the idle/waiting instance. Lambda can start the instance. Auto-Scale will scale the amount of instances up and down. Cloudwatch can put the final instance back to "Stopped" after some period of CPU idling

Upvotes: 1

Related Questions