Reputation: 23
I am running Apache Airflow 1.8 and trying to add connections via command line interface for a hive Client Wrapper. However trying to run the command
airflow connections -a --conn_id HIVE_CONN2 --conn_uri hive_cli://hiveserver/default
Commandline reports success but the Conn Type is not set correctly in the Airflow UI and connection wont work.
I think the error is related to _ in the uri prefix(scheme). I have confirmed the urlparse function to split the uri doesnt allow for underscores in the models.py.
Other than setting it manually in the UI is there another approach to add connections to Airflow - is this a defect ? Airflow should not use underscores for connection types to avoid this issue.
Upvotes: 2
Views: 8544
Reputation: 4058
This has been fixed in Airflow 1.9.0 with the addition of some extra arguments to the connections
sub command:
airflow connections -a --conn_id hive_cli_test --conn_type hive_cli --conn_host something/something
[2018-08-09 10:28:41,377] {__init__.py:51} INFO - Using executor SequentialExecutor
Successfully added `conn_id`=hive_cli_test : hive_cli://:@something/something:
Upvotes: 4
Reputation: 5415
You're right.
The conn_type
is used to determine which hook to use as an interface to an external data source / sink.
conn_type
is either extracted from the URI as you've specified correctly above, or from a connection created in the UI (and stored in the connection table in the Meta DB).
In your case, the conn_type
is extracted from the supplied URL using the parse_from_uri
method in models.py, which sets the conn_type from the scheme
returned by the urlparse method.
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py
According to https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse the scheme
is extracted from the first part of the URI.
And as you found, the urlparse method doesn't return a scheme
when there's an underscore in the url before the ://
.
e.g. verify this, try variations on this URI with and without the underscore:
from urllib.parse import urlparse
[print(v) for v in urlparse("hive_cli://hiveserver/default")]
It works slightly differently if you use beeline, as it will create a JDBC connection, but if you're not using beeline (I can see you aren't because it would be part of the --conn_extra
in the command) then it runs a subprocess.
Following the code, ultimately the hive_cli type is run as a subprocess.Popen
, i.e. directly on the airflow machine ( or worker), not via JDBC or some other connection.
https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L208
So therefore it doesn't really need a URL-type connection string, it's just using that format to shoe-horn into the airflow connections --con-uri
option. Since it doesn't get pieced back together as a URL, then the choice to call it hive_cli
appears arbitrary, and doesn't work from the airflow cli. This all works when you use the UI because it constructs a connection by specifying that conn_type from the UI form.
It's a bug, the type name should be changed from hive_cli
to hivecli
, or something else that is descriptive and compatible with urlparse
.
Upvotes: 2