AK88
AK88

Reputation: 3026

Interpolate between two dynamic days

We have a file that is updated every week and its structure is as follows:

      ID            Code    Days  Yield
KZW1KM093436    NTK273_1343 4    2.1781
KZW1KD913708    NTK091_1370 11   2.1683
KZK1KY011222    MKM012_0122 17   2.1602
KZW1KM063603    NTK182_1360 32   2.1419
KZW1KM093477    NTK273_1347 32   2.1419
KZW1KD913740    NTK091_1374 39   2.1342
KZW1KM063629    NTK182_1362 46   2.1269
KZW1KM093501    NTK273_1350 53   2.1202

I have the following static data points (Days) for which I need Yields on a weekly basis:

28
91
182
273
364
730
1825
2555
3640

When the file gets updated, the structure of the data also changes (sorted by Days). There are several possible scenarios:

I was thinking about doing it in Excel/VBA, but would also like to see R solution. Any help would be greatly appreciated.

Expected output for Yield is calculated according to this formula:

enter image description here

So in this case for 28 Days I'd have Yield (28 - 17)*(2.1419-2.1602)/(32-17)+2.1602 = 2.1468

Upvotes: 0

Views: 610

Answers (1)

Onyambu
Onyambu

Reputation: 79208

 A=read.table(text="ID            Code    Days  Yield
                KZW1KM093436    NTK273_1343 4   2.1781
                KZW1KD913708    NTK091_1370 11  2.1683
                KZK1KY011222    MKM012_0122 17  2.1602
                KZW1KM063603    NTK182_1360 32  2.1419
                KZW1KM093477    NTK273_1347 32  2.1419
                KZW1KD913740    NTK091_1374 39  2.1342
                KZW1KM063629    NTK182_1362 46  2.1269
                KZW1KM093501    NTK273_1350 53  2.1202",header=T)

 k=c(28L, 91L, 182L, 273L, 364L, 730L, 1825L, 2555L, 3640L)

Now first find the interval where the value to be predicted lies

 funfun=function(x){
             if(x%in%A$Days)return(A[A$Days==x,3:4])
             v=findInterval(x,A$Days);na.omit(A[c(v,v+1),3:4])}

eg:

 funfun(28)
  Days  Yield
3   17 2.1602
4   32 2.1419
funfun(34)
  Days  Yield
5   32 2.1419
6   39 2.1342
funfun(4)
  Days  Yield
1    4 2.1781
funfun(0)
  Days  Yield
1    4 2.1781
funfun(345)
  Days  Yield
8   53 2.1202
 funfun(11)
  Days  Yield
2   11 2.1683

Next we write a function that computes the predicted value:

 funfun1=function(i){
   s=funfun(i)
   if(nrow(s)==1)return(s$Yield)
   (i-s$Days[1])*Reduce("/",rev(sapply(s,diff)))+s$Yield[1]
 }

sapply(k,funfun1)
[1] 2.14678 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020

Most of the values given in K lie outside the range of the days we have thus only taking the last value. What if we had different values? say:

  k=c(4,11,17,18,20,33,48,50)
  sapply(k,funfun1)
  [1] 2.178100 2.168300 2.160200 2.158980 2.156540 2.140800 2.124986 2.123071

We see for those values already in the data, we get their yields. Those above the data, we obtain the last yield in the data and those in between we estimate as required.

Hope this will help

Upvotes: 2

Related Questions