Reputation: 303
I am running a pre-installed Zeppelin Sandbox on AWS EMR 4.3 with Spark.
I've created a Notebook on Zeppelin (on the EMR cluster) and I now want to export that notebook so that I can quickly run it the next time I spin up an EMR cluster.
It turns out that Zeppelin doesn't support the export of a notebook as yet (?).
This is fine because apparently, if you can access the folder Zeppelin is 'installed' in, then you can save the folder containing the notebook and then presumably place the folder in a Zeppelin installation on another computer to access the notebook.
(All this is from http://fedulov.website/2015/10/16/export-apache-zeppelin-notebooks/)
Trouble is I can't find where the 'Installation folder' for Zeppelin is on EMR.
ps - 'Installation Folder' may be slightly incorrect, according to the post above I should be looking in /opt/zeppelin
, which doesn't exist in the Master of my EMR cluster.
Upvotes: 2
Views: 3834
Reputation: 5415
Zeppelin release (0.5.6) and later, which is included in Amazon EMR release 4.4.0 and later supports using a Configuration json file to set the notebook storage. https://aws.amazon.com/blogs/big-data/import-zeppelin-notes-from-github-or-json-in-zeppelin-0-5-6-on-amazon-emr/
You need to create a directory in an S3 bucket called /user/notebook (user is the name as per the config below)
So if your S3 bucket is
S3://my-zeppelin-bucket-name
You need:
S3://my-zeppelin-bucket-name/user/notebook
and in the below config you don't include the S3:// prefix
You save this as .json file and then store it in an S3 bucket, and when you go to launch your cluster, there's a section for Configuration where you point it to this file. Then when the cluster launches, the pieces of the configuration are injected into various configs for different hadoop tools on EMR. In this case the zeppelin-env is going to be edited at launch, prior to it installing Zeppelin.
Once you've run a cluster once, you can then clone it and it will remember this config, or use cloudformation or something like ansible to script this so your clusters always start up with storage of notebooks on S3.
[
{
"Classification": "zeppelin-env",
"Properties": {
},
"Configurations": [
{
"Classification": "export",
"Properties": {
"ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
"ZEPPELIN_NOTEBOOK_S3_BUCKET":"my-zeppelin-bucket-name",
"ZEPPELIN_NOTEBOOK_USER":"user"
},
"Configurations": [
]
}
]
}
]
Upvotes: 1
Reputation: 1871
Other solution will be creating a step in your EMR cluster to backup all your Notebooks due going one per one is a bit tedious.
s3://{s3_bucket}/notebook/notebook_backup.sh
#!/bin/bash
# - Upload Notebooks backups.
aws s3 cp /var/lib/zeppelin/notebook/ s3://{s3_bucket}/notebook/`date +"%Y/%m/%d"` --recursive
# - Update latest folder with latest Notebooks versions.
aws s3 rm s3://{s3_bucket}/notebook/latest --recursive
aws s3 cp /var/lib/zeppelin/notebook/ s3://{s3_bucket}/notebook/latest --recursive
Then in your EMR add a Step to run your own script.
s3://elasticmapreduce/libs/script-runner/script-runner.jar will allow you to run scripts from S3.
Upvotes: 2
Reputation: 303
Edit: Now Zeppelin supports export of the notebook in json format from the web interface itself ! There is a small icon on the center top of the page which allows you to export the notebook.
Zeppelin Notebooks can be found under /var/lib/zeppelin/notebook
in an AWS EMR cluster with Zeppelin Sandbox. The notebooks are contained within folders in this directory.
These folders have random names and do not correspond to the name of the Notebook.
ls /var/lib/zeppelin/notebook/
2A94M5J1Y 2A94M5J1Z 2AZU1YEZE 2B3D826UD
There's a note.json
file within each folder (which represents a Notebook) that contains the name of the Notebook and all other details.
To export a Notebook choose the notebook folder which corresponds to the notebook you are looking for copy the folder onto the new Zeppelin installation you want the notebook to be available in.
The above instructions are from: http://fedulov.website/2015/10/16/export-apache-zeppelin-notebooks/
Just that in an AWS setup the Zeppelin notebooks will be found in /var/lib/zeppelin/notebook
Upvotes: 5