sl007
sl007

Reputation: 67

JS UDF with partition of input data from bigquery standard sql

I'm trying to pass two sets of cash flows (partition by field "id") to js udf IRRCalc and calculate an IRR number for each cash flow set.

  CREATE TEMPORARY FUNCTION IRRCalc(cash_flow ARRAY<FLOAT64>, date_delta ARRAY<INT64>)
    RETURNS FLOAT64
    LANGUAGE js AS """
      min = 0.0;
      max = 100.0;
      iter_cnt = 0;
      do {
        guess = (min + max) / 2;
        NPV = 0.0;
        for (var j=0; j<cash_flow.length; j++){
          NPV += cash_flow[j]/Math.pow((1+guess),date_delta[j]/365);
        }
        if (cash_flow[0] > 0){
          if (NPV > 0){
            max = guess;
          }
          else {
            min = guess;
          }
        }
        if (cash_flow[0] < 0){
          if (NPV > 0){
            min = guess;
          }
          else {
            max = guess;
          }
        }
        iter_cnt = iter_cnt+1;
      } while (Math.abs(NPV) > 0.00000001 && iter_cnt<8192);
      return guess;

    """;
WITH Input AS
 (
  select
    id,
    scenario_date,
    cash_flow_date,
    date_diff(cash_flow_date, min(cash_flow_date) over (partition by id),day) as date_delta,
    sum(cash_flow) as cash_flow
  from cash_flow_table
  where id in ('1','2')
  group by 1,2,3
  order by 1,2,3
 )

 select 
    id, 
    IRRCalc(array(select cash_flow from input), array(select date_delta from input)) as IRR
 from input
 group by 1

Input data:

Row id  scenario_date   cash_flow_date  date_delta  cash_flow    
1   1   2018-04-02  2016-07-01  0   5979008.899131917    
2   1   2018-04-02  2016-08-03  33  -2609437.0145417987  
3   1   2018-04-02  2016-08-29  59  -21682.04267909576   
4   1   2018-04-02  2016-09-16  77  -4968554.060201097   
5   1   2018-04-02  2018-04-02  640 0.0  
6   2   2018-04-02  2017-09-08  0   -320912.83786916407  
7   2   2018-04-02  2017-09-27  19  3015.2821677139805   
8   2   2018-04-02  2018-03-28  201 3204.6920948425554   
9   2   2018-04-02  2018-04-02  206 440424.3826431843    

Ideally, I'm expecting a output table as below:

Row id  IRR  
1   1   3.2
2   2   0.8 

However, what I ended up is below output table:

Row id  IRR  
1   1   3.8
2   2   3.8 

I think the issue comes from when i call IRRCalc, everything is put to a single array, instead of being partitioned by id. You will see what I mean if you run below:

 select 
    array(select cash_flow from input), 
    array(select date_delta from input)
 from input

instead of IRRCalc(array(select cash_flow from input), array(select date_delta from input)). Can someone please have a look and let me know how to apply a partition by id logic on the two arrays cash_flow and date_delta before passing it to the function IRRCalc?

Upvotes: 0

Views: 254

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

below is what you are looking for most outer select statement

SELECT 
  id, 
  IRRCalc(ARRAY_AGG(cash_flow), ARRAY_AGG(date_delta)) AS IRR
FROM input
GROUP BY id 

it groups by id and forms respective arrays that are passed to your UDF - so result is id-specific
Assuming that logic for WITH input AS is correct - you should get expected result

Upvotes: 2

Related Questions