Reputation: 630
Does anyone know if it is possible to programmatically import/export DataFlow pipelines (deployed or in draft status)?
The idea is to write a script to drop and create a DataFusion instance, in order to avoid billing when it's not used. Via gloud commandline it's possible to provision a DataFusion cluster and to destroy it, but it would be interesting to automatically export and import all my pipelines too.
The official documentation, unfortunately, didn't help me...
Thanks!
Upvotes: 3
Views: 3088
Reputation: 314
you can export/import pipelines using a script.
The script should contain at first the auth part:
export AUTH_TOKEN=$(gcloud auth print-access-token)
export INSTANCE_ID=****
export CDAP_ENDPOINT=$(gcloud beta data-fusion instances describe \
--project=**** \
--location=**** \
--format="value(apiEndpoint)" \
${INSTANCE_ID})
Then, you should GET the application names of your instance, count is for the apps number and the list apps is the list of names of your apps:
json=$(curl -X GET -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/default/apps")
count=$(echo $json | jq '. | length')
apps=$(echo $json | jq -r .[].name)
Finally, you make a for loop to GET the json of every app and write it in your local directory:
for app in $apps
do
start=$(date +%s)
echo $n/$count
((n=n+1))
appjson=$(curl -s -X GET -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/default/apps/$app")
echo $appjson > "${app}".json
done
Whole script:
export AUTH_TOKEN=$(gcloud auth print-access-token)
export INSTANCE_ID=****
export CDAP_ENDPOINT=$(gcloud beta data-fusion instances describe \
--project=**** \
--location=**** \
--format="value(apiEndpoint)" \
${INSTANCE_ID})
json=$(curl -X GET -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/default/apps")
count=$(echo $json | jq '. | length')
apps=$(echo $json | jq -r .[].name)
n=1
for app in $apps
do
start=$(date +%s)
echo $n/$count
((n=n+1))
appjson=$(curl -s -X GET -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/default/apps/$app")
echo $appjson > exported-pipelines/"${app}".json
done
read
Upvotes: 1
Reputation: 3500
You could use the REST API to do this. However you would probably need some script that automatically does this given the instance url. You should be able to get pipeline config from application list API (reference here). In your case you first need to get list of pipelines (reference here) then iterate through all pipelines and get details of individual pipeline which will have a property called configuration
which will have the config pipeline json. You still have to create a new JSON with name, description, artifact information along with config property with configuration json you received from backend.
A sample would look like this,
artifactName=cdap-data-pipeline,cdap-data-streams
as query parameter/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams
namespaces/default/apps/<app-name>
For each app get configuration
property in the response and form your final JSON to something like,
{
"name": "Pipeline_1",
"description": "Pipeline to do taskX",
"artifact": {
"name": "cdap-data-pipeline",
"version": "6.1.0-SNAPSHOT",
"scope": "USER"
},
"config": JSON.parse(<configuration-from-app-detailed-api>)
}
One thing to note is, if you have setup schedules or triggers for pipelines in old cluster, those won't be created in the new cluster. Rest of the pipeline should just work if you are just deploying and running the pipeline.
Hope this helps.
Just realized there is docs on accessing REST API for datafusion here However it doesn't take entirely about HOW to make the REST api call. Here is an example on how to do it,
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -w"\n" -X GET <instance-url>/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams
Here we use gcloud to get access-token to that specific instance. A pre-requisite for this would be to signin with gcloud SDK. This should successfully return the list of apps in your specific instance once the authentication is successful.
Upvotes: 5