Kishan Kumar
Kishan Kumar

Reputation: 183

How to only get only errors from insert_rows_from_dataframe method in Bigquery Client?

I am using client.insert_rows_from_dataframe method to insert data into my table.

obj = client.insert_rows_from_dataframe(table=TableRef, dataframe=df)

If there is no errors, obj will be an empty list of lists like

> print(obj)
[[] [] []]

But I want to know how to get the error messages out, if there are some errors while inserting?

I tried

obj[["errors"]] ? 

but that is not correct. Please help.

Upvotes: 2

Views: 2549

Answers (1)

rmesteves
rmesteves

Reputation: 4085

To achieve the results that you want, you must set to your DataFrame a header identical to the one in your schema. For example, if you schema in BigQuery has the fields index and name, your DataFrame should have these two columns.

Lets take a look in the example below:

  1. I created an table in BigQuery named insert_from_dataframe which contains the fields index, name and number, respectively INTEGER, STRING and INTEGER, all of them REQUIRED.
  2. In the image below you can see that the insertion cause no errors.In the second image, we can see that the data was inserted.

enter image description here No erros raised



enter image description here

Data inserted successfully



  1. After that, I removed the column number for the last row of the same data. As you can see below, when I tried to push it to BigQuery, I got an error.

enter image description here

Given that, I would like to reinforce two points:

  1. The error structured that is returned is a list of lists ( [],[],[],...]). The reason for that is because your data is supposed to be pushed in chunks (subsets of your data). In the function used you can specify how many rows each chunk will have using the parameter chunk_size=<number_of_rows>. Lets suppose that your data has 1600 rows and your chunk size is 500. You data will be divided into 4 chunks. The object returned after the insert request, hence, will consist of 4 lists inside a list, where each one of the four lists is related to one chunk. Its also important to say that if a row fails the process, all the rows inside the same chunk will not be inserted in the table.
  2. If you are using string fields you should pay attention in the data inserted. Sometimes Pandas read null values as empty strings and it leads to a misinterpretation of the data by the insertion mechanism. In other words, its possible that you have empty strings inserted in your table while the expected result would be an error saying that the field can not be null.

Finally, I would like to post here some useful links for this problem:

  1. BigQuery client documentation
  2. Working with missing values in Pandas

I hope it helps.

Upvotes: 4

Related Questions