CyberPlayerOne
CyberPlayerOne

Reputation: 3180

Pros and Cons of Amazon SageMaker VS. Amazon EMR, for deploying TensorFlow-based deep learning models?

I want to build some neural network models for NLP and recommendation applications. The framework I want to use is TensorFlow. I plan to train these models and make predictions on Amazon web services. The application will be most likely distributed computing.

I am wondering what are the pros and cons of SageMaker and EMR for TensorFlow applications?

They both have TensorFlow integrated.

Upvotes: 9

Views: 16134

Answers (2)

IsakBosman
IsakBosman

Reputation: 1533

In general terms, they serve different purposes.

EMR is when you need to process massive amounts of data and heavily rely on Spark, Hadoop, and MapReduce (EMR = Elastic MapReduce). Essentially, if your data is in large enough volume to make use of the efficiencies of Spark, Hadoop, Hive, HDFS, HBase and Pig stack then go with EMR.

EMR Pros:

  • Generally, low cost compared to EC2 instances
  • As the name suggests Elastic meaning you can provision what you need when you need it
  • Hive, Pig, and HBase out of the box

EMR Cons:

  • You need a very specific use case to truly benefit from all the offerings in EMR. Most don't take advantage of its entire offering

SageMaker is an attempt to make Machine Learning easier and distributed. The marketplace provides out of the box algos and models for quick use. It's a great service if you conform to the workflows it enforces. Meaning creating training jobs, deploying inference endpoints

SageMaker Pros:

  • Easy to get up and running with Notebooks
  • Rich marketplace to quickly try existing models
  • Many different example notebooks for popular algorithms
  • Predefined kernels that minimize configuration
  • Easy to deploy models
  • Allows you to distribute inference compute by deploying endpoints

SageMaker Cons:

  • Expensive!
  • Enforces a certain workflow making it hard to be fully custom
  • Expensive!

Upvotes: 14

BSP
BSP

Reputation: 775

From AWS documentation:

Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

(...) Amazon SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Amazon SageMaker removes all the barriers that typically slow down developers who want to use machine learning.

Conclussion: If you want to deploy AI models just use AWS SageMaker

Upvotes: 4

Related Questions