Raoot
Raoot

Reputation: 1771

Add filename as column on import to BigQuery?

This is a question about importing data files from Google Cloud Storage to BigQuery.

I have a number of JSON files that follow a strict naming convention to include some key data not included in the JSON data itself.

For example:

xxx_US_20170101.json.gz
xxx_GB_20170101.json.gz
xxx_DE_20170101.json.gz

Which is client_country_date.json.gz At the moment, I have some convoluted processes in a Ruby app that reads the files, appends the additional data and then writes it back to a file that is then imported into a single daily table for the client in BigQuery.

I am wondering if it is possible to grab and parse the filename as part of the import to BigQuery? I could then drop the convoluted Ruby processes which occasionally fail on larger files.

Upvotes: 4

Views: 4934

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59225

You could define an external table pointing to your files:

enter image description here

Note that the table type is "external table", and that it points to multiple files with the * glob.

Now you can query for all data in these files, and query for the meta-column _FILE_NAME:

#standardSQL
SELECT *, _FILE_NAME filename
FROM `project.dataset.table` 

You can now save these results to a new native table.

enter image description here

Upvotes: 10

Related Questions