d3wannabe
d3wannabe

Reputation: 1317

BigQuery ARRAY_TO_STRING based on condition in non-array field

I have a table that I query like this...

select *
from table
where productId = 'abc123'

Which returns 2 rows (even though the productId is unique) because one of the columns (orderName) is an Array...

**productId, productName, created, featureCount, orderName**
abc123, someProductName, 2020-01-01, 12, someOrderName
      ,                ,           ,   , someOtherOrderName

I'm not sure whether the missing values in the 2nd row are empty strings or nulls because of the way the orderName array expands my search results but I want to now run a query like this...

select productName, ARRAY_TO_STRING(orderName,'-')
from table
where productId = 'abc123'
and ifnull(featureCount,0) > 0

But this query returns...

someProductName, someOrderName-someOtherOrderName

i.e. both array values came back even though I specified a condition of featureCount>0.

I'm sure I'm missing something very basic about how Arrays function in BigQuery but from Google's ARRAY_TO_STRING documentation I don't see any way to add a condition to the extracting of ARRAY values. Appreciate any thoughts on the best way to go about this.

Upvotes: 0

Views: 719

Answers (1)

Betjens
Betjens

Reputation: 1401

For what I understand, this is because you are just querying one row of data which have a column as ARRAY<STRING>. As you are using ARRAY_TO_STRINGS it will only accept ARRAY<STRING> values you will see all array values fit into just one cell.

So, when you run your script, your output will fit your criteria and return the columns with arrays with additional rows for visibility.

The visualization on the UI should look like your mention in your question:

Row productId productName created featureCount orderName
1 abc123 someProductName 2020-01-01 12 someOrderName
someOtherOrderName

Note: On bigquery this additional row is gray out ( ) and Its part of row 1 but it shows as an additional row for visibility. So this output only have 1 row in the table.

And the visualization on a JSON will be:

[
  {
    "productId": "abc123",
    "productName": "someProductName",
    "created": "2020-01-01",
    "featureCount": "12",
    "orderName": [
      "someOrderName",
      "someOtherOrderName"
    ]
  }
]

I don't think there is specific documentation info about how you visualize arrays on UI but I can share the docs that talks about how to flattening your rows outputs into a single row line, check:

I use the following to replicate your issue:

CREATE OR REPLACE TABLE `project-id.dataset.working_table` (
 productId STRING,
 productName STRING,
 created STRING,
 featureCount STRING,
 orderName ARRAY<STRING>
);

insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName) 
values ('abc123','someProductName','2020-01-01','12',['someOrderName','someOtherOrderName']);
 
insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName) 
values ('abc123X','someProductNameX','2020-01-02','15',['someOrderName','someOtherOrderName','someData']);
 

output

Row productId productName created featureCount orderName
1 abc123 someProductName 2020-01-01 12 someOrderName
someOtherOrderName
2 abc123X someProductNameX 2020-01-02 15 someOrderName
someOtherOrderName
someData

Note: Table contains 2 rows.

Upvotes: 1

Related Questions