Adiansyah
Adiansyah

Reputation: 353

SQL Bigquery fill null values with calculated from certain factor

I want to fill null value with a new price value. The new price value will calculated from the other product available price (same product) times the factor.

given table,

Prod | unit | factor |  price

abc     X       1       24000
abc     Y       12      NULL
xyz     X       1       NULL
xyz     y       5       60000
xyz     Z       20      NULL

that formula that comes to mind

null price = avail same prod price * it's factor/null price factor

with the existing table above, examples price formula will be

'abc Y price' =  20000 * 1 / 12 = 2000   (avail price is abc X)
'xyz X price' =  60000 * 5 / 1 = 300000  (avail price is xyz Y)
'xyz Z price' =  60000 * 5 / 20 = 15000  (avail price is xyz Y)

is there any way i can do this?

Upvotes: 0

Views: 1047

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269793

I think this does what you want:

select t.*,
       coalesce(price,
                max(price * factor) over (partition by prod) / factor
               ) as calculated_price
from t;

This replaces NULL prices with the maximum price * factor for the product -- then divided by the factor on the given row.

Upvotes: 2

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

if a product has 2 or more price list, just fill the null with the lowest factor

#standardSQL
SELECT t.* REPLACE(IFNULL(t.price, t.factor * p.price / p.factor) AS price)
FROM `project.dataset.table` t
LEFT JOIN (
  SELECT prod, ARRAY_AGG(STRUCT(price, factor) ORDER BY factor LIMIT 1)[SAFE_OFFSET(0)].*
  FROM `project.dataset.table`
  WHERE NOT price IS NULL
  GROUP BY prod
) p
USING(prod)  

If to apply to sample from your question - result is

Row prod    unit    factor  price    
1   abc     X       1       24000.0  
2   abc     Y       12      288000.0     
3   xyz     X       1       12000.0  
4   xyz     Y       5       60000.0  
5   xyz     Z       20      240000.0     

Note: it looks like in your formula you need to reverse factors - for example 60000 * 20 / 5 - not sure, but this looks more logical for me. If I am wrong you can adjust t.factor * p.price / p.factor and use p.factor * p.price / t.factor instead

In this case result will be (which matches what you expected but as I said already I suspect is wrong -but it is up to you obviously)

Row prod    unit    factor  price    
1   abc     X       1       24000.0  
2   abc     Y       12      2000.0   
3   xyz     X       1       300000.0     
4   xyz     Y       5       60000.0  
5   xyz     Z       20      15000.0  

Upvotes: 1

Related Questions