I have an existing EMR cluster in AWS. I want to run a dag from airflow to an aws existing cluster

Question

I have an airflow machine which has apache-airflow==1.10.5 version. I know how to run a dag which automatically creates a cluster and runs the step and terminate the cluster. Using connections in airflow UI I am able to achieve this. But to run a dag on existing aws emr cluster i am unable to know which parameters i need to pass in the connections.

AIRFLOW UI --> Admin --> Connections --> Created Conn ID (EMR Default1), conn type Elastic Map reduce.

[2019-10-14 12:12:40,919] {taskinstance.py:1051} ERROR - Parameter validation failed:
Missing required parameter in input: "Instances"
Traceback (most recent call last):
  File "/root/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 926, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/root/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/operators/emr_create_job_flow_operator.py", line 68, in execute
    response = emr.create_job_flow(self.job_flow_overrides)
  File "/root/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/hooks/emr_hook.py", line 55, in create_job_flow
    response = self.get_conn().run_job_flow(**config)
  File "/root/anaconda3/envs/airflow/lib/python3.6/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/root/anaconda3/envs/airflow/lib/python3.6/site-packages/botocore/client.py", line 586, in _make_api_call
    api_params, operation_model, context=request_context)
  File "/root/anaconda3/envs/airflow/lib/python3.6/site-packages/botocore/client.py", line 621, in _convert_to_request_dict
    api_params, operation_model)
  File "/root/anaconda3/envs/airflow/lib/python3.6/site-packages/botocore/validate.py", line 291, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in input: "Instances"
[2019-10-14 12:12:40,920] {taskinstance.py:1082} INFO - Marking task as FAILED.

I have an existing EMR cluster in AWS. I want to run a dag from airflow to an aws existing cluster

Answers (1)

Related Questions