Reputation: 1311
I want to check the value of dml_affected_rows with BigQuery Client.
I have checked [1], and it works when the query (or DML) contains a single job. However, when the results are written from children's jobs like the below script, the num_dml_affected_rows
is void. How can I check the child job's num_dml_affected_rows
. I am think something like below.
query_job.children_job[-1].num_dml_affected_rows # (last child dml result)
But couldn't find the right way (or method) to accomplish it from the documents [2, 3].
[1] BigQuery update how to get number of updated rows
[2] BQ REST
[3] BQ Client API (dev)
Sample Script (python3)
sql = """
INSERT `project.dataset.table` (logdatetime)
SELECT
CURRENT_DATETIME() as logdatetime
;
SELECT 1
;
INSERT `project.dataset.table` (logdatetime)
SELECT
CURRENT_DATETIME() as logdatetime
;
"""
query_job = bigquery.Client().query(sql)
query_job.result()
print(query_job.num_child_jobs) # 3
print(query_job.num_dml_affected_rows) # None
p.s. My current solution is to store the rows counts of destination table
before and after the DML to check the difference, but this is unnecessary if I am able to directly access DML results of the child job.
Upvotes: 3
Views: 1071
Reputation: 96
This may be a bit late for the original question but in case someone else stumbles across this, here is my solution:
def run_bigquery_query():
bigquery_client = bigquery.Client()
sql_query = """ SELECT / INSERT / ... """
job = bigquery_client.query(sql_query)
job.result() # Wait for the whole script to finish
affected_rows = 0
for child_job in bigquery_client.list_jobs(parent_job=job.job_id):
if child_job.num_dml_affected_rows is not None:
affected_rows += child_job.num_dml_affected_rows
print("Affected rows: {}".format(affected_rows))
Upvotes: 4
Reputation: 5503
The way that works in scripting is the system variable @@row_count.
If used in a script and the previous script statement is DML, specifies the number of rows modified, inserted, or deleted, as a result of that DML statement.
In your case, if you do
sql = """
INSERT `project.dataset.table` (logdatetime)
SELECT
CURRENT_DATETIME() as logdatetime
;
SELECT @@row_count -- output how many row inserted by previous DML
;
INSERT `project.dataset.table` (logdatetime)
SELECT
CURRENT_DATETIME() as logdatetime
;
SELECT @@row_count -- output how many row inserted by previous DML
;
"""
query_job = bigquery.Client().query(sql)
query_job.result()
print(query_job.num_child_jobs) # you're getting 4 jobs here
print(query_job.num_dml_affected_rows) # still None
Upvotes: 4