thebeancounter
thebeancounter

Reputation: 4839

how to load --jars with pyspark with spark standalone on client mode

I am using python 2.7 with spark standalone cluster on client mode.

I want to use jdbc for mysql and found that i need to load it using --jars argument, I have the jdbc on my local, and manage to load it with pyspark console like here

When I write a python script inside my ide, using pyspark, I don't manage to load the additional jar mysql-connector-java-5.1.26.jar and keep get

no suitable driver

error

How can I load additional jar files when running a python script in client mode, using a standalone cluster on client mode and refering to a remote master?

edit: added some code ######################################################################### this is the basic code that i am using, i use pyspark with spark context in python e.g i do not use spark submit directly and don't understand how to use spark submit parameters in this case...

def createSparkContext(masterAdress = algoMaster):
    """
    :return: return a spark context that is suitable for my configs 
     note the ip for the master 
     app name is not that important, just to show off 
    """
    from pyspark.mllib.util import MLUtils
    from pyspark import SparkConf
    from pyspark import SparkContext
    import os


    SUBMIT_ARGS = "--driver-class-path /var/nfs/general/mysql-connector-java-5.1.43 pyspark-shell"
    #SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.2.0 pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
    conf = SparkConf()
    #conf.set("spark.driver.extraClassPath", "var/nfs/general/mysql-connector-java-5.1.43")
    conf.setMaster(masterAdress)
    conf.setAppName('spark-basic')
    conf.set("spark.executor.memory", "2G")
    #conf.set("spark.executor.cores", "4")
    conf.set("spark.driver.memory", "3G")
    conf.set("spark.driver.cores", "3")
    #conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
    sc = SparkContext(conf=conf)
    print sc._conf.get("spark.executor.extraClassPath")

    return sc


sql = SQLContext(sc)
df = sql.read.format('jdbc').options(url='jdbc:mysql://ip:port?user=user&password=pass', dbtable='(select * from tablename limit 100) as tablename').load()
print df.head()

Thanks

Upvotes: 1

Views: 1931

Answers (1)

MaFF
MaFF

Reputation: 10086

Your SUBMIT_ARGS is going to be passed to the spark-submit when creating a sparkContext from python. You should use --jars instead of --driver-class-path.

EDIT

Your problem is actually a lot simpler than it seems: you're missing the parameter driver in the options:

sql = SQLContext(sc)
df = sql.read.format('jdbc').options(
    url='jdbc:mysql://ip:port', 
    user='user',
    password='pass',
    driver="com.mysql.jdbc.Driver",
    dbtable='(select * from tablename limit 100) as tablename'
).load()

You can also put userand password in separate arguments.

Upvotes: 2

Related Questions