Reputation: 1981
I am running zeppelin 0.7.0 on an emr-5.4.0 cluster. I am starting the cluster with the default settings. The %spark.dep
interpreter doesn't get configured by EMR.
I have edited the file /etc/zeppelin/conf/interpreter.json
from the below:
"2ANGGHHMQ": {
"id": "2ANGGHHMQ",
"name": "spark",
"group": "spark",
"properties": {
"spark.yarn.jar": "",
"zeppelin.spark.printREPLOutput": "true",
"master": "yarn-client",
"zeppelin.spark.maxResult": "1000",
"spark.app.name": "Zeppelin",
"zeppelin.spark.useHiveContext": "true",
"args": "",
"spark.home": "/usr/lib/spark",
"zeppelin.spark.concurrentSQL": "false",
"zeppelin.spark.importImplicit": "true",
"zeppelin.pyspark.python": "python",
"zeppelin.dep.localrepo":"/usr/lib/zeppelin/local-repo"
},
"interpreterGroup": [
{
"class": "org.apache.zeppelin.spark.SparkInterpreter",
"name": "spark"
},
{
"class": "org.apache.zeppelin.spark.PySparkInterpreter",
"name": "pyspark"
},
{
"class": "org.apache.zeppelin.spark.SparkSqlInterpreter",
"name": "sql"
}
],
"option": {
"remote": true,
"port": -1,
"perNoteSession": false,
"perNoteProcess": false,
"isExistingProcess": false
}
}
I have to manually add the following and restart zeppelin:
{
"class":"org.apache.zeppelin.spark.DepInterpreter",
"name": "dep"
}
Is there a way to make EMR use the default zeppelin settings (and not remove this config)?
UPDATE
Could someone also explain why the cluster I have just created this morning, by cloning the original cluster, has a completely different config?
"interpreterGroup": [
{
"name": "spark",
"class": "org.apache.zeppelin.spark.SparkInterpreter",
"defaultInterpreter": false,
"editor": {
"language": "scala",
"editOnDblClick": false
}
},
{
"name": "pyspark",
"class": "org.apache.zeppelin.spark.PySparkInterpreter",
"defaultInterpreter": false,
"editor": {
"language": "python",
"editOnDblClick": false
}
},
{
"name": "sql",
"class": "org.apache.zeppelin.spark.SparkSqlInterpreter",
"defaultInterpreter": false,
"editor": {
"language": "sql",
"editOnDblClick": false
}
}
]
Upvotes: 4
Views: 1716
Reputation: 30839
As per AWS, cloning a cluster only clones the basic configuration and not the changes that you have made after creating it. Also, there is no configuration API in EMR that allows you to change Zeppelin's interpreter.json
file so the only way is to change the configuration manually at the moment.
Zeppelin does seem to have set of REST APIs that allow you to change interpreter settings. Especially this API endpoint which allows you to create interpreter settings. However, that does not seem to work with following payload:
POST
: http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting
Payload
:
{
"name": "dep",
"group": "spark",
"properties": {},
"interpreterGroup": [
{
"class":"org.apache.zeppelin.spark.DepInterpreter",
"name": "dep",
"defaultInterpreter": true
}
],
"dependencies": []
}
So, the only option is to manually change interpreter.json
at the moment. Should the above endpoint work, you can add it into Cluster creating step as explained here.
Upvotes: 4