Dimitrios Mistriotis
Dimitrios Mistriotis

Reputation: 2788

Google BigQuery bulk load data into table

I think that what I want to do is not feasible at the moment but want to clarify.

I have a bucket say bucketA with files served to the public and a bucket say bucketB where access logs of bucketA are stored in a specific CSV format

What I want to do is to run SQL queries to these access logs. The problem that I have is that the logs are stored in different CSVs (one per hour I think). I tried to import them through BigQuery UI interface but it seems that there is a one to one CSV to table mapping. When you define the input location the placeholder and documentation as you to put a gs://<bucket_name>/<path_to_input_file>.

Based on the above my question is: Is it possible to upload a all files in a bucket to a single BigQuery table, with something like an "*" asterisk operator?

Once the table is constructed what happens when more files with data get stored in the bucket? Do I need to re-run, is there a scheduler?

Upvotes: 1

Views: 3319

Answers (2)

oraclesoon
oraclesoon

Reputation: 803

Using BigQuery web UI, after I have created the new table + some initial data with the standard upload csv method.

For quick testing, how to use BigQuery web UI to insert more new data into the existing table?

I realized I CANNOT copy and paste multiple insert statements in the Query editor textbox.

INSERT INTO dataset.myschema VALUES ('new value1', 'more value1');
INSERT INTO dataset.myschema VALUES ('new value2', 'more value2');

Wow, then it will be tedious to insert new line of data 1 by 1.

Luckily BigQuery supports INSERT statements that use VALUES syntax can insert multiple rows.

INSERT INTO dataset.myschema VALUES ('new value1', 'more value1'),
('new value2', 'more value2');

Upvotes: 0

Graham Polley
Graham Polley

Reputation: 14781

Based on the above my question is: Is it possible to upload a all files in a bucket to a single BigQuery table, with something like an "*" asterisk operator?

You can query them directly in GCS (federated source) or load then all into a native table using * in both cases:

enter image description here

enter image description here

enter image description here

Once the table is constructed what happens when more files with data get stored in the bucket? Do I need to re-run, is there a scheduler?

If you leave it as en external table, then each time you query BigQuery will scan all the files, so you'll get new files/data. If you load it as a native table, then you'll need to schedule a job yourself to append each new file to your table.

Upvotes: 2

Related Questions