user193616
user193616

Reputation: 23

Adding a connection to Airflow via command line for hive_cli fails

I am running Apache Airflow 1.8 and trying to add connections via command line interface for a hive Client Wrapper. However trying to run the command

airflow connections -a --conn_id HIVE_CONN2 --conn_uri hive_cli://hiveserver/default

Commandline reports success but the Conn Type is not set correctly in the Airflow UI and connection wont work.

I think the error is related to _ in the uri prefix(scheme). I have confirmed the urlparse function to split the uri doesnt allow for underscores in the models.py.

Other than setting it manually in the UI is there another approach to add connections to Airflow - is this a defect ? Airflow should not use underscores for connection types to avoid this issue.

Upvotes: 2

Views: 8544

Answers (2)

Ash Berlin-Taylor
Ash Berlin-Taylor

Reputation: 4058

This has been fixed in Airflow 1.9.0 with the addition of some extra arguments to the connections sub command:

airflow connections -a --conn_id hive_cli_test --conn_type hive_cli --conn_host something/something
[2018-08-09 10:28:41,377] {__init__.py:51} INFO - Using executor SequentialExecutor

        Successfully added `conn_id`=hive_cli_test : hive_cli://:@something/something:

Upvotes: 4

Davos
Davos

Reputation: 5415

You're right.

The conn_type is used to determine which hook to use as an interface to an external data source / sink.

conn_type is either extracted from the URI as you've specified correctly above, or from a connection created in the UI (and stored in the connection table in the Meta DB).

In your case, the conn_type is extracted from the supplied URL using the parse_from_uri method in models.py, which sets the conn_type from the scheme returned by the urlparse method. https://github.com/apache/incubator-airflow/blob/master/airflow/models.py

According to https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse the scheme is extracted from the first part of the URI.

And as you found, the urlparse method doesn't return a scheme when there's an underscore in the url before the ://.

e.g. verify this, try variations on this URI with and without the underscore:

from urllib.parse import urlparse
[print(v) for v in urlparse("hive_cli://hiveserver/default")]

It works slightly differently if you use beeline, as it will create a JDBC connection, but if you're not using beeline (I can see you aren't because it would be part of the --conn_extra in the command) then it runs a subprocess.

Following the code, ultimately the hive_cli type is run as a subprocess.Popen, i.e. directly on the airflow machine ( or worker), not via JDBC or some other connection.

https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L208

So therefore it doesn't really need a URL-type connection string, it's just using that format to shoe-horn into the airflow connections --con-uri option. Since it doesn't get pieced back together as a URL, then the choice to call it hive_cli appears arbitrary, and doesn't work from the airflow cli. This all works when you use the UI because it constructs a connection by specifying that conn_type from the UI form.

It's a bug, the type name should be changed from hive_cli to hivecli, or something else that is descriptive and compatible with urlparse.

Upvotes: 2

Related Questions