Vinod Jayachandran
Vinod Jayachandran

Reputation: 3898

GCP Dataflow extract JOB_ID

For a DataFlow Job, I need to extract Job_ID from JOB_NAME. I have the below command and the corresponding o/p. Can you please guide how to extract JOB_ID from the below response

$ gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job"
JOB_ID                                    NAME                        TYPE       CREATION_TIME        STATE    REGION
2020-10-07_10_11_20-15879763245819496196  sample-job  Streaming  2020-10-07 17:11:21  Running  us-central1

If we can use Python script to achieve it, even that will be fine.

Upvotes: 0

Views: 1792

Answers (4)

Abel Matos
Abel Matos

Reputation: 181

gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job" --format="value(JOB_ID)"

Upvotes: 2

Vinod Jayachandran
Vinod Jayachandran

Reputation: 3898

Created my GIST with Python script to achieve it.

Upvotes: 1

rsantiago
rsantiago

Reputation: 2099

With python, you can retrieve the jobs' list with a REST request to the Dataflow's method https://dataflow.googleapis.com/v1b3/projects/{projectId}/jobs

Then, the json response can be parsed to filter the job name you are searching for by using a if clause:

if job["name"] == 'sample-job'

I tested this approached and it worked:

import requests   
import json

base_url = 'https://dataflow.googleapis.com/v1b3/projects/'
project_id = '<MY_PROJECT_ID>'
location = '<REGION>'

response = requests.get(f'{base_url}{project_id}/locations/{location}/jobs', headers = {'Authorization':'Bearer <BEARER_TOKEN_HERE>'})
# <BEARER_TOKEN_HERE> can be retrieved with 'gcloud auth print-access-token' obtained with an account that has access to Dataflow jobs. 
# Another authentication mechanism can be found in the link provided by danielm

jobslist = response.json()

for key,jobs in jobslist.items():
 for job in jobs:
  if job["name"] == 'beamapp-0907191546-413196':
   print(job["name"]," Found, job ID:",job["id"])
  else:
   print(job["name"]," Not matched")
   
# Output:
# windowedwordcount-0908012420-bd342f98  Not matched
# beamapp-0907200305-106040  Not matched
# beamapp-0907192915-394932  Not matched
# beamapp-0907191546-413196  Found, job ID: 2020-09-07...154989572

Upvotes: 1

danielm
danielm

Reputation: 3010

You can use standard command line tools to parse the response of that command, for example

gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job" | tail -n 1 | cut -f 1 -d " "

Alternatively, if this is from a Python program already, you can use the Dataflow API directly instead of using the gcloud tool, like in How to list down all the dataflow jobs using python API

Upvotes: 1

Related Questions