Reputation: 4839
I am using python 2.7 with spark standalone cluster on client mode.
I want to use jdbc for mysql and found that i need to load it using --jars
argument, I have the jdbc on my local, and manage to load it with pyspark console like here
When I write a python script inside my ide, using pyspark, I don't manage to load the additional jar mysql-connector-java-5.1.26.jar
and keep get
no suitable driver
error
How can I load additional jar files when running a python script in client mode, using a standalone cluster on client mode and refering to a remote master?
edit: added some code ######################################################################### this is the basic code that i am using, i use pyspark with spark context in python e.g i do not use spark submit directly and don't understand how to use spark submit parameters in this case...
def createSparkContext(masterAdress = algoMaster):
"""
:return: return a spark context that is suitable for my configs
note the ip for the master
app name is not that important, just to show off
"""
from pyspark.mllib.util import MLUtils
from pyspark import SparkConf
from pyspark import SparkContext
import os
SUBMIT_ARGS = "--driver-class-path /var/nfs/general/mysql-connector-java-5.1.43 pyspark-shell"
#SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.2.0 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
conf = SparkConf()
#conf.set("spark.driver.extraClassPath", "var/nfs/general/mysql-connector-java-5.1.43")
conf.setMaster(masterAdress)
conf.setAppName('spark-basic')
conf.set("spark.executor.memory", "2G")
#conf.set("spark.executor.cores", "4")
conf.set("spark.driver.memory", "3G")
conf.set("spark.driver.cores", "3")
#conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
sc = SparkContext(conf=conf)
print sc._conf.get("spark.executor.extraClassPath")
return sc
sql = SQLContext(sc)
df = sql.read.format('jdbc').options(url='jdbc:mysql://ip:port?user=user&password=pass', dbtable='(select * from tablename limit 100) as tablename').load()
print df.head()
Thanks
Upvotes: 1
Views: 1931
Reputation: 10086
Your SUBMIT_ARGS
is going to be passed to the spark-submit
when creating a sparkContext from python. You should use --jars
instead of --driver-class-path
.
EDIT
Your problem is actually a lot simpler than it seems: you're missing the parameter driver
in the options:
sql = SQLContext(sc)
df = sql.read.format('jdbc').options(
url='jdbc:mysql://ip:port',
user='user',
password='pass',
driver="com.mysql.jdbc.Driver",
dbtable='(select * from tablename limit 100) as tablename'
).load()
You can also put user
and password
in separate arguments.
Upvotes: 2