KISHAN DHRANGADHARIYA
KISHAN DHRANGADHARIYA

Reputation: 80

Is it possible to read gcs object's metadata from BigQuery using UDF

I am ingesting data to BigQuery tables from GCS objects (using csv and json filetypes (in separate tables)), I want to perform data validation, so during file upload to GCS I have stored row count in file metadata, now after ingestion is complete, I want to validate if the row count in ingested table matches the row count in file metadata.

Note: I thought of utilising js availability in UDFs but it doesn't allow us to use external library

Update:

According to this article on medium, they are saying that we can use external library by posting a compiled js package as .js file to Cloud Storage and calling it using options in UDF

since I don't know anything about js, I am not sure how to follow this article.

Note: I have also posted a code snipet available on the article.

https://hoffa.medium.com/new-in-bigquery-persistent-udfs-c9ea4100fd83

CREATE OR REPLACE FUNCTION x.nlp_compromise_number(str STRING)
RETURNS NUMERIC LANGUAGE js AS '''
   return nlp(str).values(0).toNumber().out()
'''
OPTIONS (
  library="gs://fh-bigquery/js/compromise.min.11.14.0.js");

Upvotes: 1

Views: 622

Answers (2)

guillaume blaquiere
guillaume blaquiere

Reputation: 76083

You can't perform external API call with BigQuery. Neither in UDF nor in standard query. You can't get data from outside (GCP or elsewhere). You have to store all the data in BigQuery before running your queries.

Upvotes: 1

Malaman
Malaman

Reputation: 314

There is no way to pull the Object metadata [1] using UDF in Standard SQL [2]. However, it would be easier to use a Cloud Function to perform the data validation becuase it will be easier to pull from both sources and perform the comparision.

[1] - https://cloud.google.com/storage/docs/viewing-editing-metadata#view

[2] - https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions

Upvotes: 1

Related Questions