Reputation: 80
I am ingesting data to BigQuery tables from GCS objects (using csv and json filetypes (in separate tables)), I want to perform data validation, so during file upload to GCS I have stored row count in file metadata, now after ingestion is complete, I want to validate if the row count in ingested table matches the row count in file metadata.
Note: I thought of utilising js availability in UDFs but it doesn't allow us to use external library
Update:
According to this article on medium, they are saying that we can use external library by posting a compiled js package as .js file to Cloud Storage and calling it using options in UDF
since I don't know anything about js, I am not sure how to follow this article.
Note: I have also posted a code snipet available on the article.
https://hoffa.medium.com/new-in-bigquery-persistent-udfs-c9ea4100fd83
CREATE OR REPLACE FUNCTION x.nlp_compromise_number(str STRING)
RETURNS NUMERIC LANGUAGE js AS '''
return nlp(str).values(0).toNumber().out()
'''
OPTIONS (
library="gs://fh-bigquery/js/compromise.min.11.14.0.js");
Upvotes: 1
Views: 622
Reputation: 76083
You can't perform external API call with BigQuery. Neither in UDF nor in standard query. You can't get data from outside (GCP or elsewhere). You have to store all the data in BigQuery before running your queries.
Upvotes: 1
Reputation: 314
There is no way to pull the Object metadata [1] using UDF in Standard SQL [2]. However, it would be easier to use a Cloud Function to perform the data validation becuase it will be easier to pull from both sources and perform the comparision.
[1] - https://cloud.google.com/storage/docs/viewing-editing-metadata#view
[2] - https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
Upvotes: 1