Milos Bejda
Milos Bejda

Reputation: 435

AWS EMR Zeppelin is missing MYSQL interpreter

I launched a fresh AWS EMR Spark cluster with Zeppelin on AWS to query an MYSQL database. When I tried to add an MYSQL interpreter in Zeppelin the option does not exist. I googled to find a way to get the interpreter to display but I didn't find a solution. How can I get the MYSQL interpreter in Zeppelin so I can query the MYSQL database?

enter image description here

Upvotes: 0

Views: 665

Answers (1)

Scott Hsieh
Scott Hsieh

Reputation: 1493

Spark SQL supports many features of SQL:2003 and SQL:2011 [ 1][2], you may consider doing that that via Spark on Zeppelin by adding dependency.

  1. Get a mysql connector with proper version
  2. Add it as a dependency to the Spark interpreter on Zeppelin. (I put the jar on the master machine) enter image description here
  3. You should be able to access a MySQL table right now. The following is an example using the API of Scala:

    /* Database Configuration*/
    val jdbcURL = s"jdbc:mysql://${HOST}/${DATABASE}"
    val jdbcUsername = s"${USERNAME}"
    val jdbcPassword = s"${PASSWORD}"
    
    import java.util.Properties
    val connectionProperties = new Properties()
    connectionProperties.put("user", jdbcUsername)
    connectionProperties.put("password", jdbcPassword)
    connectionProperties.put("driver", "com.mysql.cj.jdbc.Driver")
    
    /* Read Data from MySQL */
    val desiredData = spark.read.jdbc(jdbcURL, "${TABLE NAME}", connectionProperties)
    desiredData.printSchema
    
    /* Data Manipulation */
    desiredData.createOrReplaceTempView("desiredData")
    val query = s"""
    SELECT COUNT(*) AS `Record Number`
    FROM desiredData
    """
    spark.sql(query).show
    
    val query2 = s"""
    SELECT ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column1, column2) AS column3
    FROM desiredData
    """
    spark.sql(query2).show
    .
    .
    .
    

Testing Notes:

  1. EMR: emr-5.10.0 with Pig 0.17.0, Zeppelin 0.7.3, and ,Spark 2.2.0
  2. MySQL: MariaDB 5.2.10

References

  1. Apache Hive (n.d.). Home. [online] Cwiki.apache.org. Available at: https://cwiki.apache.org/confluence/display/Hive/Home [Accessed 1 Dec. 2017].
  2. Apache Spark (n.d.). Compatibility with Apache Hive. [online] spark.apache.org. Available at: ​https://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive [Accessed 1 Dec. 2017].

Upvotes: 4

Related Questions