Hal
Hal

Reputation: 333

How to group by condition and average only if column value is not null in bigquery sql

Hi I have a table that shows the category of product and another table with daily price of the product. I would like to get the average price of the category where average not count null values. How do I achieve this? Example of table product

product category
apple fruit
pear fruit
grape fruit
celery vegetables
cabbage vegetables
chicken meat
turkey meat
beef meat

another table with daily price and productid as columns and the price in the rows

date apple pear grape celery cabbage chicken turkey beef
2022-01-01 2 4 1 2 3 4 3
2022-01-02 2 2 2 4 3
2022-01-03 2 2 2 3

into

date fruit vegetables meat
2022-01-01 3 1.5 3.3
2022-01-02 2 2 3.5
2022-01-02 2 2 3

Where average is only to columns where it is not null, it would be better not to do it manually.

Upvotes: 0

Views: 218

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172964

Consider also below approach

create temp function keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
""";
create temp function values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
""";
select *
from (
  select date, category, round(avg(safe_cast(price as float64)), 2) avg_price
  from prices t, unnest([struct(to_json_string(t) as json)]),
  unnest(keys(json)) product with offset
  join unnest(values(json)) price with offset using(offset)
  left join products using(product)
  where product != 'date'
  group by date, category
)
pivot (any_value(avg_price) for category IN ('fruit', 'vegetables', 'meat'))  

if applied to sample data in your question - output is

enter image description here

Potential benefit of using above is to eliminate need in enlisting all column names from products table, which are 8 in your example but in reality most likely much more! Obviously, another way to address this is to build dynamic query and run it using execute immediate which you can find quite a number of examples here on SO.
But, assuming that number of categories is significantly lower (just few as in your example) to compare with number of products - I would use this approach as execute immediate has its own drawbacks ...

Upvotes: 0

Jaytiger
Jaytiger

Reputation: 12234

Consider below query using UNPIVOT AND PIVOT:

SELECT * FROM (
  SELECT date, category, price 
    FROM prices UNPIVOT (price FOR productid IN (apple, pear, grape, celery, cabbage, chicken, turkey, beef)) p
    JOIN category c ON c.product = p.productid
) PIVOT (AVG(price) FOR category IN ('fruit', 'vegetables', 'meat'))
ORDER BY date;

enter image description here

Upvotes: 1

Related Questions