Reputation: 67
I'm trying to run below query in bigquery using standard SQL and javascript UDF. The query takes forever to run, thus I'm not even able to verify if the function is working or not. Can you please let me know if there is anything wrong with the query that makes it run forever? I tried to change the function call
from IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR
to IRRCalc(array(select cash_flow from input),array(select date_delta from input)) as IRR
and it resolved the issue. Though I don't understand what's wrong with IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR
. Can someone please have a look and shed some light? Many thanks.
Here's the query:
CREATE TEMPORARY FUNCTION IRRCalc(cash_flow ARRAY<FLOAT64>, date_delta ARRAY<INT64>)
RETURNS FLOAT64
LANGUAGE js AS """
min = 0.0;
max = 1.0;
do {
guess = (min + max) / 2;
NPV = 0.0;
for (var j=0; j<cash_flow.length; j++){
NPV += cash_flow[j]/Math.pow((1+guess),date_delta[j]/365);
}
if (NPV > 0){
min = guess;
}
else {
max = guess;
}
} while (Math.abs(NPV) > 0.00000001);
return guess * 100;
""";
WITH Input AS
(
select
cash_flow_date,
date_diff(cash_flow_date, min(cash_flow_date) over (),day) as date_delta,
cash_flow as cash_flow
from cash_flow_table
)
SELECT
cash_flow,
date_delta,
IRRCalc(Array<FLOAT64> [cash_flow], ARRAY<INT64> [date_delta]) as IRR
FROM Input;
And here's the table containing the raw data:
Row cash_flow_date date_delta cash_flow
1 2017-09-08 0 -159951.78265102694
2 2017-09-08 0 -9.272567110204461
3 2017-09-08 0 -1000.0
4 2017-09-08 0 -159951.78265102694
5 2017-09-27 19 3552.8711640094157
6 2017-09-27 19 -544.122218768042
7 2018-03-28 201 -576.4290755116443
8 2018-03-28 201 3763.8202775817454
9 2018-04-02 206 437225.5536144294
Upvotes: 5
Views: 2941
Reputation: 172993
Can someone please have a look and shed some light?
to see the difference - just run your SELECT w/o UDF
SELECT
cash_flow,
date_delta,
ARRAY<FLOAT64> [cash_flow],
ARRAY<INT64> [date_delta]
FROM Input
As you can see here - for each row you create array with just one element in it - so actually two arrays with one element in each - that element that respectively belong to same row
when you do ARRAY(SELECT cash_flow FROM input), ARRAY(SELECT date_delta FROM input)
you actually create arrays which with respective elements from all rows
finally - when you pass ARRAY with just one element in it - it looks like your while (Math.abs(NPV) > 0.00000001)
always true thus loop runs forever
Something along these lines I think
Note: above answers your exact question - but you still most likely have issue with logic - if so - ask new specific question
Upvotes: 4