sl007
sl007

Reputation: 67

Bigquery javascript UDF with array

I'm trying to run below query in bigquery using standard SQL and javascript UDF. The query takes forever to run, thus I'm not even able to verify if the function is working or not. Can you please let me know if there is anything wrong with the query that makes it run forever? I tried to change the function call from IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR to IRRCalc(array(select cash_flow from input),array(select date_delta from input)) as IRR and it resolved the issue. Though I don't understand what's wrong with IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR. Can someone please have a look and shed some light? Many thanks.

Here's the query:

CREATE TEMPORARY FUNCTION IRRCalc(cash_flow ARRAY<FLOAT64>, date_delta ARRAY<INT64>)
RETURNS FLOAT64
LANGUAGE js AS """
  min = 0.0;
  max = 1.0;
  do {
    guess = (min + max) / 2;
    NPV = 0.0;
    for (var j=0; j<cash_flow.length; j++){
      NPV += cash_flow[j]/Math.pow((1+guess),date_delta[j]/365);
    }
    if (NPV > 0){
      min = guess;
    }
    else {
      max = guess;
    }
  } while (Math.abs(NPV) > 0.00000001);
  return guess * 100;

""";

WITH Input AS
(
select
  cash_flow_date,
  date_diff(cash_flow_date, min(cash_flow_date) over (),day) as date_delta,
  cash_flow as cash_flow
from cash_flow_table
)

SELECT 
  cash_flow,
  date_delta,
  IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR
FROM Input;

And here's the table containing the raw data:

Row cash_flow_date date_delta cash_flow
1 2017-09-08 0 -159951.78265102694
2 2017-09-08 0 -9.272567110204461
3 2017-09-08 0 -1000.0
4 2017-09-08 0 -159951.78265102694
5 2017-09-27 19 3552.8711640094157
6 2017-09-27 19 -544.122218768042
7 2018-03-28 201 -576.4290755116443
8 2018-03-28 201 3763.8202775817454
9 2018-04-02 206 437225.5536144294

Upvotes: 5

Views: 2941

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Can someone please have a look and shed some light?

to see the difference - just run your SELECT w/o UDF

SELECT 
  cash_flow,
  date_delta,
  ARRAY<FLOAT64> [cash_flow], 
  ARRAY<INT64> [date_delta]
FROM Input

As you can see here - for each row you create array with just one element in it - so actually two arrays with one element in each - that element that respectively belong to same row

when you do ARRAY(SELECT cash_flow FROM input), ARRAY(SELECT date_delta FROM input) you actually create arrays which with respective elements from all rows

finally - when you pass ARRAY with just one element in it - it looks like your while (Math.abs(NPV) > 0.00000001) always true thus loop runs forever

Something along these lines I think

Note: above answers your exact question - but you still most likely have issue with logic - if so - ask new specific question

Upvotes: 4

Related Questions