Reputation: 1767
I’m trying to connect to a PostgreSQL database with Airflow but I’m running into a problem that I can’t seem to figure out. For a tl;dr, the bottom of my post contains the error which I get when trying to create a postgres table in my airflow dag.
I’m using docker-compose
with the puckel image, but I’ve extended it for my uses so the l/custom_airflow:1.5
is just my adjusted image to include a few python dependencies – it just grabs the puckel image and pip installs my own deps. Here is my docker-compose
file:
version: '3.7'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
db:
image: postgres:13.0-alpine
restart: always
environment:
POSTGRES_DB: postgres
POSTGRES_USER: admin_user
POSTGRES_PASSWORD: secret_password
POSTGRES_HOST_AUTH_METHOD: trust
# PGDATA: /var/lib/postgresql/data
volumes:
- db-data:/var/lib/postgresql/data
ports:
- "5439:5432"
pgadmin:
image: dpage/pgadmin4:4.27
restart: always
environment:
PGADMIN_DEFAULT_EMAIL: [email protected]
PGADMIN_DEFAULT_PASSWORD: test
PGADMIN_LISTEN_PORT: 1111
ports:
- "1111:1111"
volumes:
- pgadmin-data:/var/lib/pgadmin
links:
- "db:pgsql-server"
webserver:
image: l/custom_airflow:1.5
container_name: l_custom_airflow
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- ./dags:/usr/local/airflow/dags
- ./db-data:/usr/local/airflow/db-data
- ./pgadmin-data:/usr/local/airflow/pgadmin-data
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
volumes:
db-data:
pgadmin-data:
Although airflow uses the service postgres
to store its own data about DAGs, I create a second postgres service called db
so that it is separate, and set it on port 5439
.
This all seems to run fine. When I check docker container ls
shows that the postgres container is named pipeline_5_db_1
, and checking docker network ls
and inspecting the containers using docker container inspect l_custom_airflow | grep Network
seems to show that they are all mapped to the same network.
In the airflow UI I select Admin-->Connections-->Create and fill the following:
I then have my simple dag:
from airflow.models import DAG
from airflow.utils.dates import days_ago
from airflow.operators.python_operator import PythonOperator
from airflow.operators.postgres_operator import PostgresOperator
args = {'start_date': days_ago(1)}
dag = DAG(
dag_id='testing_dag',
default_args=args,
schedule_interval=None
)
with dag:
op1 = PostgresOperator(
task_id = "make_a_staging_table",
postgres_conn_id = "my_pg_connection",
sql = """CREATE TABLE scaled_data
(
idx_col integer,
petal_width float,
petal_length float
);
"""
)
op1
I want to test this out and try to make a table. But when I trigger the DAG, I get the following error in the logs:
[2020-10-29 08:26:53,156] {{taskinstance.py:1128}} ERROR - could not connect to server: Connection refused
Is the server running on host "pipeline_5_db_1" (172.22.0.2) and accepting
TCP/IP connections on port 5439?
I have searched and this seems like a common error but I’m not sure what it means or how to solve it in my situation. I have tried setting host to 0.0.0.0
and other minor adjustments without success.
Upvotes: 5
Views: 9137
Reputation: 61
In this case, you need to use the name of the service as a host, so when you are creating the connection set the host as db
instead of pipeline_5_db_1
.
I hope the answer has not come so late :)
Upvotes: 3