Herwini
Herwini

Reputation: 437

Pyspark in docker container postgresql dababase connection

I am trying to connect to a postgres database on the localhost:5432 of my computer using pyspark inside a docker container. For this I use VS code. VS code automatically builds and runs the container. This is the code I have:

password = ...
user = ...
url = 'jdbc:postgresql://127.0.0.1:5432/postgres'

    
    spark = SparkSession.builder.config("spark.jars","/opt/spark/jars/postgresql-42.2.5.jar") \
        .appName("PySpark_Postgres_test").getOrCreate()
        
    
df = connector.read.format("jbdc") \
.option("url", url) \
    .option("dbtable", 'chicago_crime') \
        .option("user", user) \
            .option("password", password) \
                .option("driver", "org.postgresql.Driver") \
                    .load()

I keep getting the same error:

"An error occurred while calling o358.load.\n: java.lang.ClassNotFoundException: \nFailed to find data source: jbdc. ...

Maybe the url is not correct?

url = 'jdbc:postgresql://127.0.0.1:5432/postgres'

The database is on port 5432 and has the name postgres. The database is on my localhost but since I am working in a docker container I assumed the correct way would be to enter the ip adress of your laptops localhost 127.0.0.1. If you type localhost it would refer to the localhost of your docker container. Or should I use the IPv4 Address (Wireless Lan .. or wsl).

Anyone knows what's wrong?

ps, one of the commands in my dockerfile is the following:

RUN wget https://jdbc.postgresql.org/download/postgresql-42.2.5.jar -P /opt/spark/jars

Upvotes: 2

Views: 1308

Answers (1)

Herwini
Herwini

Reputation: 437

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.jars", "/opt/spark/jars/postgresql-42.2.5.jar") \
    .getOrCreate()
    
df = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://host.docker.internal:5432/postgres") \
    .option("dbtable", "chicago_crime") \
    .option("user", "postgres") \
    .option("password", "postgres") \
    .option("driver", "org.postgresql.Driver") \
    .load()

Upvotes: 5

Related Questions