Reputation: 108
My JSON looks like:
{
"Key1":"Value1","Key2":"Value2","Key3":"Value3","List1":
[
{
"SubKey1":"SubValue1_1","SubKey2":"SubValue1_2","SubKey3":"SubValue1_3"
},
{
"SubKey1":"SubValue2_1","SubKey2":"SubValue2_2","SubKey3":"SubValue2_3"
},
{
"SubKey1":"SubValue3_1","SubKey2":"SubValue3_2","SubKey3":"SubValue3_3"
}
]
}
It loads in a single BigQuery table like this:
But I want my data to load in 2 seperate tables like:
and
Please guide on what I should do.
Upvotes: 0
Views: 350
Reputation: 219
It would be possible if you can use the bq command line.
Presuming that your JSON file (my_json_file.json) lives in a GCS bucket (e.g. my_gcs_bucket) and the destination table my_dataset.my_destination_table, you can run the following command
bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table "gs://my_gcs_bucket/my_json_file.json" ./schema.json
where in the schema.json, you have already selected the schema of the destination table. For instance, the following two schemas will load the data as expected:
schema_1.json
[
{
"mode": "NULLABLE",
"name": "Key1",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Key2",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Key3",
"type": "STRING"
}
]
and schema_2.json
[
{
"mode": "NULLABLE",
"name": "Key1",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "SubKey1",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "SubKey2",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "SubKey3",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "List1",
"type": "RECORD"
}
]
and then
bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table_1 "gs://my_gcs_bucket/my_json_file.json" ./schema_1.json
bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table_2 "gs://my_gcs_bucket/my_json_file.json" ./schema_2.json
Will load two different tables based on the same JSON file
Upvotes: 1
Reputation: 800
I don't think it is possible to do it directly while loading the data.
I would advise loading the data as it is and then executing the following queries to create tables with data in the format you need. The "cs_test" is the name of the dataset for your tables, "so" is the name of the table with loaded data:
CREATE TABLE cs_test.so_2 AS SELECT key1, list FROM cs_test.so AS s, unnest(s.list1) AS list;
The data in "so_2" would look like this:
And to create a table with the data in the format like this:
You need to execute the following query:
CREATE TABLE cs_test.so_1 AS SELECT key1, key2, key3 FROM cs_test.so;
Upvotes: 2