Reputation: 1771
This is a question about importing data files from Google Cloud Storage to BigQuery.
I have a number of JSON files that follow a strict naming convention to include some key data not included in the JSON data itself.
For example:
xxx_US_20170101.json.gz
xxx_GB_20170101.json.gz
xxx_DE_20170101.json.gz
Which is client_country_date.json.gz
At the moment, I have some convoluted processes in a Ruby app that reads the files, appends the additional data and then writes it back to a file that is then imported into a single daily table for the client in BigQuery.
I am wondering if it is possible to grab and parse the filename as part of the import to BigQuery? I could then drop the convoluted Ruby processes which occasionally fail on larger files.
Upvotes: 4
Views: 4934
Reputation: 59225
You could define an external table pointing to your files:
Note that the table type is "external table", and that it points to multiple files with the *
glob.
Now you can query for all data in these files, and query for the meta-column _FILE_NAME
:
#standardSQL
SELECT *, _FILE_NAME filename
FROM `project.dataset.table`
You can now save these results to a new native table.
Upvotes: 10