Reputation: 275

connecting mysql with pyspark

I want to connect mysql with pyspark. I am using jupyter notebook to run pyspark. However when I do this,

dataframe_mysql = sqlContext.read.format("jdbc").options(
    url="jdbc:mysql://localhost:3306/playground",
    driver = "com.mysql.jdbc.Driver",
    dbtable = "play1",
    user="root",
    password="sp123").load()

I get an error as

Py4JJavaError: An error occurred while calling o89.load. : java.lang.ClassNotFoundException: com.mysql.jdbc.Driver.

How can I resolve this error and load mysql data in pyspark dataframe?

Upvotes: 8

Answers (4)

Sanjeev

Reputation: 381

I have used below code in Pycharm and it worked for me.

from pyspark.sql import SparkSession
from pyspark.sql.functions import *

# Create SparkSession
spark = SparkSession.builder.master("local").appName("Examples.com") \
 .config("spark.jars", "C:\user\Software\mysql-connector-java-8.0.15.jar").getOrCreate()

df_mysql = spark.read.format("jdbc") \
.option("url", "jdbc:mysql://localhost:3306/classicmodels") \
.option("driver", "com.mysql.cj.jdbc.Driver") \
.option("dbtable", "orders") \
.option("user", "root") \
.option("password", "password") \
.load()

print(df_mysql.count())

Upvotes: 0

murdock

Reputation: 1

I know that I'm a couple of years late but in case someone ends here by googling an answer I will explain what I did to fix this.

What you need is including the jar package to connect to the database. As someone suggested you could include it by extraClassPath but in reality all you need is to include it on your SparkSession config. It looks like this:

    spark = SparkSession \
    .builder \
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:2.7.0") \
    .config("spark.jars.packages", "mysql:mysql-connector-java:8.0.17") \
    .getOrCreate()

This way the jar package is pulled from a maven repository and then when you use it on your driver property its already loaded.

Upvotes: 0

Nontapat Sumalnop

Reputation: 31

I use python script :

spark = SparkSession \
        .builder \
        .appName('test') \
        .master('local[*]') \
        .config("spark.driver.extraClassPath", "<path to mysql-connector-java-5.1.49-bin.jar>") \
        .getOrCreate()

df = spark.read.format("jdbc").option("url","jdbc:mysql://localhost/<database_name>").option("driver","com.mysql.jdbc.Driver").option("dbtable","<table_name>").option("user","<user>").option("password","<password>").load()

replace any in <> with your parameters.

Upvotes: 3

reith

Reputation: 2088

Using notebook launched by `pyspark`

Install MySQL Java connector driver by Maven/Gradle or download jar file directly. Then provide jar path to pyspark as --jars argument. If you choosed maven approach it should be like this for mysql connector version 8.0.11 :

pyspark --jars "${HOME}/.m2/repository/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar"

Using `findspark`

Using add-packages provide mysql driver, like:

import findspark

findspark.add_packages('mysql:mysql-connector-java:8.0.11')

Upvotes: 1

connecting mysql with pyspark

Answers (4)

Using notebook launched by pyspark

Using findspark

Related Questions

Using notebook launched by `pyspark`

Using `findspark`