Reputation: 47
hello i installed pyspark now and i have a database postgres in local in DBeaver : how can i connect to postgres from pyspark please
i tried this
from pyspark.sql import DataFrameReader
url = 'postgresql://localhost:5432/coucou'
properties = {'user': 'postgres', 'password': 'admin'}
df = DataFrameReader(sqlContext).jdbc(
url='jdbc:%s' % url, table='tw_db', properties=properties
)
but i have an error
File "C:\Spark\spark-3.1.2-bin-hadoop3.2\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o37.jdbc.
: java.lang.ClassNotFoundException: C:/Users/Desktop/postgresql-42.2.23.jre7.jar
Upvotes: 0
Views: 457
Reputation: 15318
You need to add the jars you want to use when creating the sparkSession.
See this : https://spark.apache.org/docs/2.4.7/submitting-applications.html#advanced-dependency-management
Either when you start pyspark
pyspark --repositories MAVEN_REPO
# OR
pyspark --jars PATH_TO_JAR
or when you create your sparkSession objects
SparkSession.builder.master("yarn").appName(app_name).config("spark.jars.packages", "MAVEN_PACKAGE")
# OR
SparkSession.builder.master("yarn").appName(app_name).config("spark.jars", "PATH_TO_JAR")
You need maven packages when you do not have the jar in local or your jars needs some dependencies.
Upvotes: 1