Reputation: 876
I have been using google_cloud
for submitting a job to big query
for extraction of the file to GCS
like this:
dataset = self.bqClient.dataset(self.website_id)
table = dataset.table(table_name)
job_name = str(uuid.uuid4())
job = self.bqClient.extract_table_to_storage(
job_name, table, destination)
job.destination_format = "NEWLINE_DELIMITED_JSON"
job.compression = 'GZIP'
job.begin()
Here the job id
or name
is the uuid4
. Basically I am collecting these job ids
in a queue
and want to check later if that job
is in DONE
state. How can I do that?
I have been searching for this but so far had no luck. I was only able to find the function - client.list_jobs()
but it's the list of all jobs
. I only want to query
or search
for one particular job
.
Upvotes: 0
Views: 2793
Reputation: 11777
You could just filter out by the name
attribute which job you want.
Let's say you want to get information about a job whose id
is "big name string job 1". You can filter it out from the list of jobs by running:
job_name = "big name string job 1"
job = [job for job in list(self.bqClient.list_jobs()) if job.name == job_name][0] # this will break if list is empty
print(job.state) # DONE or RUNNING
Just make sure to update the client if possible, currently we are ate version 0.26.0
[EDIT]:
You said in your comment about having 1 million jobs. As far as running the get job method, currently the API does so only when running the exists and the reload methods so there's no way of running something like client.get_job(job_name)
.
Still, in the list_job
, you can send the arguments all_users
and state_filter
as in the code, something like:
job_name = "big name string job 1"
job = [job for job in list(self.bqClient.list_jobs(all_users=False, state_filter='done')) if job.name == job_name][0]
Which lists jobs only for current user authorized in the client with given state.
If still this lists millions, then you could still (somewhat "hacky" solution) query it directly from the job base constructor, something like:
from google.cloud.bigquery.job import _AsyncJob
job = _AsyncJob(job_name, self.bqClient)
job.reload()
print(job.state) #RUNNING or DONE
Which is an indirect way of running a client.get_job()
.
This might be an interesting feature request for the python repository.
Upvotes: 2