cryanbhu
cryanbhu

Reputation: 5254

Insert a subset of a struct into BigQuery table, and have remaining struct fields null

I am trying to insert into a BigQuery table that has a struct.
The struct is very large and I am only inserting one field.

I used the DML statement below:

INSERT INTO `my-project.my-dataset.sesh_20200101` (trafficSource)
VALUES (STRUCT('campaign_123058182' AS campaign))

I get the query error

Value has type STRUCT<campaign STRING> which cannot be inserted into column trafficSource, which has type STRUCT<referralPath STRING, campaign STRING, source STRING, ...> at [2:9]

Is there a way to have the missing (non-specified) remaining fields of the trafficSource struct be filled with null without typing them all out in the query?
I saw some other question like this but it was done in Go. I would like to do it in SQL DML if possible.

Upvotes: 4

Views: 4873

Answers (4)

Anna
Anna

Reputation: 59

You can use SQL to do that by specifying the struct type in your INSERT/UPDATE.

Google docs provide an example for partial struct UPDATE.

And here is how you can do INSERT:

CREATE TABLE `temp.struct`
AS SELECT STRUCT(1 AS a, 2 AS b, 3 AS c) AS s;

INSERT `temp.struct` (s)
VALUES (
  STRUCT<a INT64, b INT64, c INT64> -- here is your full struct type definition
  (11, null, 33) -- and this is the value that you insert
); 

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269873

Here is an idea. Extract a structure with NULL values in each field. Then assign one of the fields and convert back to a struct. That would be:

INSERT INTO `my-project.my-dataset.sesh_20200101` (trafficSource)
    SELECT trafficSource
    FROM (SELECT ANY_VALUE(trafficSource).* REPLACE ('campaign_123058182' AS campaign)
          FROM `my-project.my-dataset.sesh_20200101`
          WHERE 1 = 0
        ) trafficSource;

This appears to work even if trafficSource has arrays and structs as components. Resulting arrays are empty and the struct has NULL values.

Upvotes: 3

Yun Zhang
Yun Zhang

Reputation: 5503

(I wrote this mostly for fun, but it could be more fun if you find it useful in some way)

Below approach allows you to at least only specify prefix of all the fields of your struct:

create temp table test as
select struct<a int64, b int64, c int64, d int64, e int64>(1,2,3,4,5) col;

begin
-- Use a variable and "DEFAULT" to avoid specifying full type
declare x default (select col from test limit 0);

-- Magic here is to use "x.* EXCEPT(a)" to specify the rest of the fields as NULLs
insert into test(col)
select as struct 100 a, x.* EXCEPT(a);

end

Below approach is more like what you were asking for, but it is more expensive for it uses a staging table.

create temp table myUpdates as 
Select trafficSource.* from `my-project.my-dataset.sesh_20200101` limit 0;

insert into myUpdates(campaign) values ('campaign_123058182');

INSERT INTO `my-project.my-dataset.sesh_20200101` (trafficSource)
select as struct * from myUpdates;

Upvotes: 1

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

The easiest way I found to do this is outside SQL, with bq load.

First, I created a table:

CREATE TABLE `temp.struct`
AS SELECT STRUCT(1 AS a, 2 AS b, 3 AS c) AS s

Now, if I want to add a row with a single value of that struct filled in, I can create a file like:

{"s":{"b":9}}

And now I can load that file, which will fill every other struct column as null:

bq load --source_format=NEWLINE_DELIMITED_JSON temp.struct struct.json

Upvotes: 1

Related Questions