Mayank
Mayank

Reputation: 303

How to download Zeppelin Notebook from AWS EMR

I am running a pre-installed Zeppelin Sandbox on AWS EMR 4.3 with Spark.

I've created a Notebook on Zeppelin (on the EMR cluster) and I now want to export that notebook so that I can quickly run it the next time I spin up an EMR cluster.

It turns out that Zeppelin doesn't support the export of a notebook as yet (?).

This is fine because apparently, if you can access the folder Zeppelin is 'installed' in, then you can save the folder containing the notebook and then presumably place the folder in a Zeppelin installation on another computer to access the notebook.

(All this is from http://fedulov.website/2015/10/16/export-apache-zeppelin-notebooks/)

Trouble is I can't find where the 'Installation folder' for Zeppelin is on EMR.

ps - 'Installation Folder' may be slightly incorrect, according to the post above I should be looking in /opt/zeppelin, which doesn't exist in the Master of my EMR cluster.

Upvotes: 2

Views: 3834

Answers (3)

Davos
Davos

Reputation: 5415

Zeppelin release (0.5.6) and later, which is included in Amazon EMR release 4.4.0 and later supports using a Configuration json file to set the notebook storage. https://aws.amazon.com/blogs/big-data/import-zeppelin-notes-from-github-or-json-in-zeppelin-0-5-6-on-amazon-emr/

You need to create a directory in an S3 bucket called /user/notebook (user is the name as per the config below)

So if your S3 bucket is

S3://my-zeppelin-bucket-name

You need:

 S3://my-zeppelin-bucket-name/user/notebook

and in the below config you don't include the S3:// prefix

You save this as .json file and then store it in an S3 bucket, and when you go to launch your cluster, there's a section for Configuration where you point it to this file. Then when the cluster launches, the pieces of the configuration are injected into various configs for different hadoop tools on EMR. In this case the zeppelin-env is going to be edited at launch, prior to it installing Zeppelin.

Once you've run a cluster once, you can then clone it and it will remember this config, or use cloudformation or something like ansible to script this so your clusters always start up with storage of notebooks on S3.

[
  {
    "Classification": "zeppelin-env",
    "Properties": {

    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
        "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
          "ZEPPELIN_NOTEBOOK_S3_BUCKET":"my-zeppelin-bucket-name",
          "ZEPPELIN_NOTEBOOK_USER":"user"
        },
        "Configurations": [

        ]
      }
    ]
  }
]

Upvotes: 1

Franzi
Franzi

Reputation: 1871

Other solution will be creating a step in your EMR cluster to backup all your Notebooks due going one per one is a bit tedious.

s3://{s3_bucket}/notebook/notebook_backup.sh

#!/bin/bash
# - Upload Notebooks backups.
aws s3 cp /var/lib/zeppelin/notebook/ s3://{s3_bucket}/notebook/`date +"%Y/%m/%d"` --recursive

# - Update latest folder with latest Notebooks versions.
aws s3 rm s3://{s3_bucket}/notebook/latest --recursive
aws s3 cp /var/lib/zeppelin/notebook/ s3://{s3_bucket}/notebook/latest --recursive

Then in your EMR add a Step to run your own script.

enter image description here

s3://elasticmapreduce/libs/script-runner/script-runner.jar will allow you to run scripts from S3.

Upvotes: 2

Mayank
Mayank

Reputation: 303

Edit: Now Zeppelin supports export of the notebook in json format from the web interface itself ! There is a small icon on the center top of the page which allows you to export the notebook.

Zeppelin Notebooks can be found under /var/lib/zeppelin/notebook in an AWS EMR cluster with Zeppelin Sandbox. The notebooks are contained within folders in this directory.

These folders have random names and do not correspond to the name of the Notebook.

ls /var/lib/zeppelin/notebook/  
2A94M5J1Y  2A94M5J1Z  2AZU1YEZE  2B3D826UD 

There's a note.json file within each folder (which represents a Notebook) that contains the name of the Notebook and all other details.

To export a Notebook choose the notebook folder which corresponds to the notebook you are looking for copy the folder onto the new Zeppelin installation you want the notebook to be available in.

The above instructions are from: http://fedulov.website/2015/10/16/export-apache-zeppelin-notebooks/

Just that in an AWS setup the Zeppelin notebooks will be found in /var/lib/zeppelin/notebook

Upvotes: 5

Related Questions