Reputation: 49
First i built sbt scala application to read data from mysql table in apache spark using this line of code.
val spark = SparkSession.builder().master("local").appName("Fuzzy Match Analysis").config("spark.sql.warehouse.dir","file:///tmp/spark-warehouse").getOrCreate()
import spark.implicits._
var df = spark.read.format("jdbc")
.option("url", "jdbc:mysql://127.0.0.1:3306")
.option("dbtable","(select * from engine.ecl limit 1) as onerow")
.option("user", "root")
.option("driver", "com.mysql.jdbc.Driver")
.option("password", "root")
.load()
Now i am trying to read data from mysql table in spark shell using same method
scala> var df = spark.read.format("jdbc").option("url","jdbc:mysql://127.0.0.1:3306").option("user","root").option("password","root").option("driver","com.mysql.jdbc.Driver").option("dbtable","(select * from engine.ecl limit 1) as onerow")
df: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@22e253c7
But when i try to see whats in the row using df.show() method it gives me following error.
scala> df.show()
<console>:26: error: value show is not a member of org.apache.spark.sql.DataFrameReader
df.show()
I have tried resolving dependencies by adding these lines in
spark.driver.extraClassPath = C:/Program Files/MySQL/mysql-connector-java-5.1.36.jar spark.executor.extraClassPath = C:/Program Files/MySQL/mysql-connector-java-5.1.36.jar
Upvotes: 1
Views: 4659
Reputation: 11607
From what I can see (and also indicated by the error message), the var df
in yourspark-shell
refers to a DataFrameReader
and not a DataFrame
because you haven't invoked the load()
method of DataFrameReader
.
df: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@22e253c7
As a good practise, always make sure to specify types (even though Scala
automatically infers it) in order to avoid gotchas like this.
Upvotes: 3
Reputation: 49
To start mysql integration apache spark we need to follow these steps
To work with MySQL server in Apache Spark we need Connector/J for MySQL. Download mysql-connector-java-5.1.36, then add the class path to the conf/spark-defaults.conf
spark.driver.extraClassPath = C:/ProgramFiles/MySQL/mysql-connector-java-5.1.36.jar spark.executor.extraClassPath = C:/ProgramFiles/MySQL/mysql-connector-java-5.1.36.jar
You need to create a new instance of jdbc driver using this call
Class.forName("com.mysql.jdbc.Driver").newInstance
Using following documentation you can call mysql query in spark-shell.
val jdbcDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://localhost:3306/engine?user=root&password=root","dbtable" -> "engine.ecl","fetchSize" -> "10")).load()
or
var df = spark.read.format("jdbc").option("url","jdbc:mysql://127.0.0.1:3306").option("user","root").option("password","root").option("driver","com.mysql.jdbc.Driver").option("dbtable","(select * from engine.ecl limit 1) as onerow").load()
Upvotes: 2