satyanarayana kota
satyanarayana kota

Reputation: 11

Airflow HDFS Sensor

Trying to get HDFSSensor working. I have set up the hdfs connection and the file is there but it keeps on poking the file and never completes

Poking for file hdfs://user/airflow/stamps/test/ds=2018-10-15/_SUCCESS

code is as below

hdfs_sense_open = HdfsSensor(
        task_id='hdfs_sense_open',
        filepath='hdfs://user/airflow/stamps/test/ds=2018-10-15/_SUCCESS',
        hdfs_conn_id='hdfs_leo',
        dag=dag)

Actually it works without file name in the path. I would also like to add one more point when you create hdfs connection, you need to use the hdfs port number not webhdfs port, i.e. 8020 (may be 9000 if it's localhost) but not webhdfs port like 50070

hdfs_sense_open = HdfsSensor(
        task_id='hdfs_sense_open',
        filepath='/user/airflow/stamps/test/ds=2018-10-15/',
        hdfs_conn_id='hdfs_leo',
        dag=dag)

Thank you so much both of you for trying to help me out

Upvotes: 1

Views: 5076

Answers (1)

dlamblin
dlamblin

Reputation: 45361

Try it with the filepath set without the protocol. Like:

hdfs_sense_open = HdfsSensor(
        task_id='hdfs_sense_open',
        filepath='/user/airflow/stamps/test/ds=2018-10-15/_SUCCESS',
        hdfs_conn_id='hdfs_leo',
        dag=dag)

Upvotes: 1

Related Questions