How to pass username and password to Spark-SQL when using JDBC data source?

Question

I just started to use Spark-SQL to load data from a H2 database, here is what I did following the Spark-SQL document:

>>> sqlContext = SQLContext(sc)
>>> df = sqlContext.load(source="jdbc",driver="org.h2.Driver", url="jdbc:h2:~/test", dbtable="RAWVECTOR")

But it didn't work and gave errors, I think the problem is that the username and password are not specified in the function.

This is parameters from the document from Spark-SQL 1.3.1:

url

The JDBC URL to connect to.
dbtable The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.
driver The class name of the JDBC driver needed to connect to this URL. This class with be loaded on the master and workers before running an JDBC commands to allow the driver to register itself with the JDBC subsystem.
partitionColumn, lowerBound, upperBound, numPartitions

These options must all be specified if any of them is specified. They describe how to partition the table when reading in parallel from multiple workers. partitionColumn must be a numeric column from the table in question.

But I didn't find any clue how to pass the database user name and password to the sqlContext.load function. Any one has similar case or clues?

Thanks.

mira67 · Accepted Answer

I figured it out. Just do

df = sqlContext.load(
  source="jdbc",driver="org.h2.Driver",
  url="jdbc:h2:tcp://localhost/~/test?user=sa&password=1234",
  dbtable="RAWVECTOR"
)

And when you create the database, use same pattern:

conn = DriverManager.getConnection(
  "jdbc:h2:tcp://localhost/~/"+dbName+"?user=sa&password=1234", null, null
);

And, here is a blog about how to use the API.

Answers (1)