Eya
Eya

Reputation: 13

Apache Airflow and Apache Hive data warehouse connection

I have a hive data warehouse on my localhost (OS: Ubuntu), I want to retrieve raw data from that data warehouse and do some data processing then compute some features for an ML pipeline.

To do so, I am using the dockerized version of airflow and I want to connect airflow to hive.

The connection does not seem to be established correctly in the first place even though the first task of my dag, which establishes the connection, succeeds whenever tested.

I did an extended image of airflow to install the apache airflow provider, I included :

-apache-airflow.providers.apache.hive

and build the image ( I did the necessary in the docker-compose.yaml file).

I am also referring to my localhost in the dag to establish the connection with hive as : host.docker.internal

Upvotes: 0

Views: 94

Answers (0)

Related Questions