fffrost
fffrost

Reputation: 1767

Can't connect to postgres database in airflow using docker

I’m trying to connect to a PostgreSQL database with Airflow but I’m running into a problem that I can’t seem to figure out. For a tl;dr, the bottom of my post contains the error which I get when trying to create a postgres table in my airflow dag.

I’m using docker-compose with the puckel image, but I’ve extended it for my uses so the l/custom_airflow:1.5 is just my adjusted image to include a few python dependencies – it just grabs the puckel image and pip installs my own deps. Here is my docker-compose file:

version: '3.7'
services:
    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
        logging:
            options:
                max-size: 10m
                max-file: "3"

    db:
        image: postgres:13.0-alpine
        restart: always
        environment:
            POSTGRES_DB: postgres
            POSTGRES_USER: admin_user
            POSTGRES_PASSWORD: secret_password
            POSTGRES_HOST_AUTH_METHOD: trust
            # PGDATA: /var/lib/postgresql/data
        volumes:
            - db-data:/var/lib/postgresql/data
        ports:
            - "5439:5432"
         
    pgadmin:
        image: dpage/pgadmin4:4.27
        restart: always
        environment:
            PGADMIN_DEFAULT_EMAIL: [email protected]
            PGADMIN_DEFAULT_PASSWORD: test
            PGADMIN_LISTEN_PORT: 1111
        ports:
        - "1111:1111"
        volumes:
            - pgadmin-data:/var/lib/pgadmin
        links:
            - "db:pgsql-server"

    webserver:
        image: l/custom_airflow:1.5
        container_name: l_custom_airflow
        restart: always
        depends_on:
            - postgres
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
        logging:
            options:
                max-size: 10m
                max-file: "3"
        volumes:
            - ./dags:/usr/local/airflow/dags
            - ./db-data:/usr/local/airflow/db-data
            - ./pgadmin-data:/usr/local/airflow/pgadmin-data
        ports:
            - "8080:8080"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3

volumes:
    db-data: 
    pgadmin-data:

Although airflow uses the service postgres to store its own data about DAGs, I create a second postgres service called db so that it is separate, and set it on port 5439. This all seems to run fine. When I check docker container ls shows that the postgres container is named pipeline_5_db_1, and checking docker network ls and inspecting the containers using docker container inspect l_custom_airflow | grep Network seems to show that they are all mapped to the same network. In the airflow UI I select Admin-->Connections-->Create and fill the following:

enter image description here

I then have my simple dag:

from airflow.models import DAG
from airflow.utils.dates import days_ago
from airflow.operators.python_operator import PythonOperator
from airflow.operators.postgres_operator import PostgresOperator

args = {'start_date': days_ago(1)}

dag = DAG(
    dag_id='testing_dag',
    default_args=args, 
    schedule_interval=None
    )


with dag:

    op1 = PostgresOperator(
        task_id = "make_a_staging_table",
        postgres_conn_id = "my_pg_connection",
        sql = """CREATE TABLE scaled_data
        (
            idx_col integer,
            petal_width float,
            petal_length float
        );
            """
        )

    op1

I want to test this out and try to make a table. But when I trigger the DAG, I get the following error in the logs:

[2020-10-29 08:26:53,156] {{taskinstance.py:1128}} ERROR - could not connect to server: Connection refused
    Is the server running on host "pipeline_5_db_1" (172.22.0.2) and accepting
    TCP/IP connections on port 5439?

I have searched and this seems like a common error but I’m not sure what it means or how to solve it in my situation. I have tried setting host to 0.0.0.0 and other minor adjustments without success.

Upvotes: 5

Views: 9137

Answers (1)

In this case, you need to use the name of the service as a host, so when you are creating the connection set the host as db instead of pipeline_5_db_1.

I hope the answer has not come so late :)

Upvotes: 3

Related Questions