unacorn
unacorn

Reputation: 1022

Differences between using Sagemaker notebook vs Glue (Sagemaker) notebook

I have a Machine Learning job I want to run with Sagemaker. For data preparation and transformation, I am using some numpy and pandas steps to transform them with notebook.

I noticed AWS Glue have both Sagemaker and Zeppelin notebook which can be created via development endpoint

There isn't much info online i could find what's the difference and benefit of using one over another (i.e. Sagemaker notebook and import from s3 vs creating notebook from Glue)

From what i researched and tried it seems that i can achieve same thing with both:

Anyone able to shed light on this?

Upvotes: 2

Views: 5006

Answers (2)

Abdelrahman Maharek
Abdelrahman Maharek

Reputation: 862

The question isn't clear but let me explain this point.

When you launch a Glue Development endpoint you can attach either a SageMaker notebook or Zeppelin notebook. Both will be created and configured by Glue and your script will be executed on the Glue Dev endpoint.

If your question is "what is the difference between a SageMaker notebook created from Glue console and a SageMaker notebook created from SageMaker console?

When you create a notebook instance from Glue console, the created notebook will always have public internet access enabled. This blog explains the difference between the networking configurations with SM notebooks. You cannot also create the notebook with a specific disk size but you can stop the notebook once it's created and increase disk size.

If your question is "what is the difference between SageMaker notebook and Zeppelin notebooks?"

The answer is the first one used Jupter (very popular) while the second one uses Zeppelin.

If your question is "what is the difference between using only a SageMaker notebook versus using SM notebook + Glue dev Endpoint?"

The answer is: if you are running normal pandas + numpy without using Spark, SM notebook is much cheaper (if you use small instance type and if your data is relatively small). However, if you are trying to process a large dataset and you are planning to use spark, then SM notebook + Glue Dev endpoint will be the best option to develop the job which will be executed later as a Glue Job (transformation job) (server less).

SM notebook is like running python code on an EC2 instance versus SM notebook + Glue which is used to develop ETL jobs which you can launch to process deltas.

Upvotes: 6

Robert Kossendey
Robert Kossendey

Reputation: 6998

If you are using only numpy and pandas, functions-wise it doesn't make a real difference. But it depends on your data as well, if you want to work with data sitting in a Glue table it would be easier to work with Zeppelin notebooks via an endpoint.

Costwise I am pretty sure that Sagemaker is less expensive.

Upvotes: 0

Related Questions