BigQuery ARRAY_TO_STRING based on condition in non-array field

Question

I have a table that I query like this...

select *
from table
where productId = 'abc123'

Which returns 2 rows (even though the productId is unique) because one of the columns (orderName) is an Array...

**productId, productName, created, featureCount, orderName**
abc123, someProductName, 2020-01-01, 12, someOrderName
      ,                ,           ,   , someOtherOrderName

I'm not sure whether the missing values in the 2nd row are empty strings or nulls because of the way the orderName array expands my search results but I want to now run a query like this...

select productName, ARRAY_TO_STRING(orderName,'-')
from table
where productId = 'abc123'
and ifnull(featureCount,0) > 0

But this query returns...

someProductName, someOrderName-someOtherOrderName

i.e. both array values came back even though I specified a condition of featureCount>0.

I'm sure I'm missing something very basic about how Arrays function in BigQuery but from Google's ARRAY_TO_STRING documentation I don't see any way to add a condition to the extracting of ARRAY values. Appreciate any thoughts on the best way to go about this.

Betjens · Accepted Answer

For what I understand, this is because you are just querying one row of data which have a column as ARRAY. As you are using ARRAY_TO_STRINGS it will only accept ARRAY values you will see all array values fit into just one cell.

So, when you run your script, your output will fit your criteria and return the columns with arrays with additional rows for visibility.

The visualization on the UI should look like your mention in your question:

Row	productId	productName	created	featureCount	orderName
1	abc123	someProductName	2020-01-01	12	someOrderName
					someOtherOrderName

Note: On bigquery this additional row is gray out ( ) and Its part of row 1 but it shows as an additional row for visibility. So this output only have 1 row in the table.

And the visualization on a JSON will be:

[
  {
    "productId": "abc123",
    "productName": "someProductName",
    "created": "2020-01-01",
    "featureCount": "12",
    "orderName": [
      "someOrderName",
      "someOtherOrderName"
    ]
  }
]

I don't think there is specific documentation info about how you visualize arrays on UI but I can share the docs that talks about how to flattening your rows outputs into a single row line, check:

I use the following to replicate your issue:

CREATE OR REPLACE TABLE `project-id.dataset.working_table` (
 productId STRING,
 productName STRING,
 created STRING,
 featureCount STRING,
 orderName ARRAY
);

insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName) 
values ('abc123','someProductName','2020-01-01','12',['someOrderName','someOtherOrderName']);
 
insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName) 
values ('abc123X','someProductNameX','2020-01-02','15',['someOrderName','someOtherOrderName','someData']);

output

Row	productId	productName	created	featureCount	orderName
1	abc123	someProductName	2020-01-01	12	someOrderName
					someOtherOrderName
2	abc123X	someProductNameX	2020-01-02	15	someOrderName
					someOtherOrderName
					someData

Note: Table contains 2 rows.

BigQuery ARRAY_TO_STRING based on condition in non-array field

Answers (1)

Related Questions