roblovelock
roblovelock

Reputation: 1981

EMR Zeppelin removing DepInterpreter

I am running zeppelin 0.7.0 on an emr-5.4.0 cluster. I am starting the cluster with the default settings. The %spark.dep interpreter doesn't get configured by EMR.

I have edited the file /etc/zeppelin/conf/interpreter.json from the below:

"2ANGGHHMQ": {
  "id": "2ANGGHHMQ",
  "name": "spark",
  "group": "spark",
  "properties": {
    "spark.yarn.jar": "",
    "zeppelin.spark.printREPLOutput": "true",
    "master": "yarn-client",
    "zeppelin.spark.maxResult": "1000",
    "spark.app.name": "Zeppelin",
    "zeppelin.spark.useHiveContext": "true",
    "args": "",
    "spark.home": "/usr/lib/spark",
    "zeppelin.spark.concurrentSQL": "false",
    "zeppelin.spark.importImplicit": "true",
    "zeppelin.pyspark.python": "python",
    "zeppelin.dep.localrepo":"/usr/lib/zeppelin/local-repo"
  },
  "interpreterGroup": [
    {
      "class": "org.apache.zeppelin.spark.SparkInterpreter",
      "name": "spark"
    },
    {
      "class": "org.apache.zeppelin.spark.PySparkInterpreter",
      "name": "pyspark"
    },
    {
      "class": "org.apache.zeppelin.spark.SparkSqlInterpreter",
      "name": "sql"
    }
  ],
  "option": {
    "remote": true,
    "port": -1,
    "perNoteSession": false,
    "perNoteProcess": false,
    "isExistingProcess": false
  }
}

I have to manually add the following and restart zeppelin:

{
  "class":"org.apache.zeppelin.spark.DepInterpreter",
  "name": "dep"
}

Is there a way to make EMR use the default zeppelin settings (and not remove this config)?

UPDATE

Could someone also explain why the cluster I have just created this morning, by cloning the original cluster, has a completely different config?

"interpreterGroup": [
    {
      "name": "spark",
      "class": "org.apache.zeppelin.spark.SparkInterpreter",
      "defaultInterpreter": false,
      "editor": {
        "language": "scala",
        "editOnDblClick": false
      }
    },
    {
      "name": "pyspark",
      "class": "org.apache.zeppelin.spark.PySparkInterpreter",
      "defaultInterpreter": false,
      "editor": {
        "language": "python",
        "editOnDblClick": false
      }
    },
    {
      "name": "sql",
      "class": "org.apache.zeppelin.spark.SparkSqlInterpreter",
      "defaultInterpreter": false,
      "editor": {
        "language": "sql",
        "editOnDblClick": false
      }
    }
  ]

Upvotes: 4

Views: 1716

Answers (1)

Darshan Mehta
Darshan Mehta

Reputation: 30839

As per AWS, cloning a cluster only clones the basic configuration and not the changes that you have made after creating it. Also, there is no configuration API in EMR that allows you to change Zeppelin's interpreter.json file so the only way is to change the configuration manually at the moment.

Zeppelin does seem to have set of REST APIs that allow you to change interpreter settings. Especially this API endpoint which allows you to create interpreter settings. However, that does not seem to work with following payload:

POST : http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting

Payload:

{
  "name": "dep",
  "group": "spark",
  "properties": {},
  "interpreterGroup": [
    {
       "class":"org.apache.zeppelin.spark.DepInterpreter",
       "name": "dep",
       "defaultInterpreter": true
    }
  ],
  "dependencies": []
}

So, the only option is to manually change interpreter.json at the moment. Should the above endpoint work, you can add it into Cluster creating step as explained here.

Upvotes: 4

Related Questions