Hassan Aftab
Hassan Aftab

Reputation: 49

Read data from mysql in spark-shell

First i built sbt scala application to read data from mysql table in apache spark using this line of code.

val spark = SparkSession.builder().master("local").appName("Fuzzy Match Analysis").config("spark.sql.warehouse.dir","file:///tmp/spark-warehouse").getOrCreate()
import spark.implicits._
var df = spark.read.format("jdbc")
    .option("url", "jdbc:mysql://127.0.0.1:3306")
    .option("dbtable","(select * from engine.ecl limit 1) as onerow")
    .option("user", "root")
    .option("driver", "com.mysql.jdbc.Driver")
    .option("password", "root")
    .load()

Now i am trying to read data from mysql table in spark shell using same method

scala> var df = spark.read.format("jdbc").option("url","jdbc:mysql://127.0.0.1:3306").option("user","root").option("password","root").option("driver","com.mysql.jdbc.Driver").option("dbtable","(select * from engine.ecl limit 1) as onerow")
df: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@22e253c7

But when i try to see whats in the row using df.show() method it gives me following error.

scala> df.show()
<console>:26: error: value show is not a member of org.apache.spark.sql.DataFrameReader
       df.show()

I have tried resolving dependencies by adding these lines in

\spark-2.2.0-bin-hadoop2.7\conf\spark-defaults.conf.template

spark.driver.extraClassPath = C:/Program Files/MySQL/mysql-connector-java-5.1.36.jar spark.executor.extraClassPath = C:/Program Files/MySQL/mysql-connector-java-5.1.36.jar

Upvotes: 1

Views: 4659

Answers (2)

y2k-shubham
y2k-shubham

Reputation: 11607

From what I can see (and also indicated by the error message), the var df in yourspark-shell refers to a DataFrameReader and not a DataFrame because you haven't invoked the load() method of DataFrameReader.

df: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@22e253c7


As a good practise, always make sure to specify types (even though Scala automatically infers it) in order to avoid gotchas like this.

Upvotes: 3

Hassan Aftab
Hassan Aftab

Reputation: 49

To start mysql integration apache spark we need to follow these steps

  1. To work with MySQL server in Apache Spark we need Connector/J for MySQL. Download mysql-connector-java-5.1.36, then add the class path to the conf/spark-defaults.conf

    spark.driver.extraClassPath = C:/ProgramFiles/MySQL/mysql-connector-java-5.1.36.jar spark.executor.extraClassPath = C:/ProgramFiles/MySQL/mysql-connector-java-5.1.36.jar

  2. You need to create a new instance of jdbc driver using this call

    Class.forName("com.mysql.jdbc.Driver").newInstance

  3. Using following documentation you can call mysql query in spark-shell.

    val jdbcDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://localhost:3306/engine?user=root&password=root","dbtable" -> "engine.ecl","fetchSize" -> "10")).load()

or

  1. var df = spark.read.format("jdbc").option("url","jdbc:mysql://127.0.0.1:3306").option("user","root").option("password","root").option("driver","com.mysql.jdbc.Driver").option("dbtable","(select * from engine.ecl limit 1) as onerow").load()

Upvotes: 2

Related Questions