Reputation: 275
I want to connect mysql with pyspark. I am using jupyter notebook to run pyspark. However when I do this,
dataframe_mysql = sqlContext.read.format("jdbc").options(
url="jdbc:mysql://localhost:3306/playground",
driver = "com.mysql.jdbc.Driver",
dbtable = "play1",
user="root",
password="sp123").load()
I get an error as
Py4JJavaError: An error occurred while calling o89.load. : java.lang.ClassNotFoundException: com.mysql.jdbc.Driver.
How can I resolve this error and load mysql data in pyspark dataframe?
Upvotes: 8
Views: 15053
Reputation: 381
I have used below code in Pycharm and it worked for me.
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
# Create SparkSession
spark = SparkSession.builder.master("local").appName("Examples.com") \
.config("spark.jars", "C:\user\Software\mysql-connector-java-8.0.15.jar").getOrCreate()
df_mysql = spark.read.format("jdbc") \
.option("url", "jdbc:mysql://localhost:3306/classicmodels") \
.option("driver", "com.mysql.cj.jdbc.Driver") \
.option("dbtable", "orders") \
.option("user", "root") \
.option("password", "password") \
.load()
print(df_mysql.count())
Upvotes: 0
Reputation: 1
I know that I'm a couple of years late but in case someone ends here by googling an answer I will explain what I did to fix this.
What you need is including the jar package to connect to the database. As someone suggested you could include it by extraClassPath but in reality all you need is to include it on your SparkSession config. It looks like this:
spark = SparkSession \
.builder \
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:2.7.0") \
.config("spark.jars.packages", "mysql:mysql-connector-java:8.0.17") \
.getOrCreate()
This way the jar package is pulled from a maven repository and then when you use it on your driver property its already loaded.
Upvotes: 0
Reputation: 31
I use python script :
spark = SparkSession \
.builder \
.appName('test') \
.master('local[*]') \
.config("spark.driver.extraClassPath", "<path to mysql-connector-java-5.1.49-bin.jar>") \
.getOrCreate()
df = spark.read.format("jdbc").option("url","jdbc:mysql://localhost/<database_name>").option("driver","com.mysql.jdbc.Driver").option("dbtable","<table_name>").option("user","<user>").option("password","<password>").load()
replace any in <> with your parameters.
Upvotes: 3
Reputation: 2088
pyspark
Install MySQL Java connector driver by Maven/Gradle or download jar file directly. Then provide jar path to pyspark as --jars
argument. If you choosed maven approach it should be like this for mysql connector version 8.0.11 :
pyspark --jars "${HOME}/.m2/repository/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar"
findspark
Using add-packages provide mysql driver, like:
import findspark
findspark.add_packages('mysql:mysql-connector-java:8.0.11')
Upvotes: 1