Zach H
Zach H

Reputation: 469

Can you load standard zeppelin interpreter settings from S3?

Our company is building up a suite of common internal Spark functions and jobs, and I'd like to make sure that our data scientists have access to all of these when they prototype in Zeppelin.

Ideally, I'd like a way for them to start up a Zeppelin notebook on AWS EMR, and have the dependency jar we build automatically loaded onto it without them having to manually type in the maven information manually every time (private repo location/credentials, package info, etc).

Right now we have the dependency jar loaded on S3, and with some work we could get a private maven repository to host it on.

I see that ZEPPELIN_INTERPRETER_DIR saves off interpreter settings, but I don't think it can load from a common default location (like S3, or something)

Is there a way to tell Zeppelin on an EMR cluster to load it's interpreter settings from a common location? I can't be the first person to want this.


Other thoughts I've had but have not tried yet:

Have a script that uses aws cmd line options to start a EMR cluster with all the necessary settings pre-made for you. (Could also upload the .jar dependency if we can't get maven to work)

Use a infrastructure-as-code framework to start up the clusters with the required settings.

Upvotes: 0

Views: 344

Answers (1)

ayplam
ayplam

Reputation: 1963

I don't believe it's possible to tell EMR to load settings from a common location. The first thought you included is the way to go imo - you would aws emr create ... and that create would include a shell script step to replace /etc/zeppelin/conf.dist/interpreter.json by downloading the interpreter.json of interest from S3, and then hard restart zeppelin (sudo stop zeppelin; sudo start zeppelin).

Upvotes: 1

Related Questions