Interpolate between two dynamic days

Question

We have a file that is updated every week and its structure is as follows:

      ID            Code    Days  Yield
KZW1KM093436    NTK273_1343 4    2.1781
KZW1KD913708    NTK091_1370 11   2.1683
KZK1KY011222    MKM012_0122 17   2.1602
KZW1KM063603    NTK182_1360 32   2.1419
KZW1KM093477    NTK273_1347 32   2.1419
KZW1KD913740    NTK091_1374 39   2.1342
KZW1KM063629    NTK182_1362 46   2.1269
KZW1KM093501    NTK273_1350 53   2.1202

I have the following static data points (Days) for which I need Yields on a weekly basis:

When the file gets updated, the structure of the data also changes (sorted by Days). There are several possible scenarios:

If the value in Days column of the data is equal to any of the values in my Static data, then I'd like to take the corresponding value from Yield column.
Otherwise, I'd like to take the closest upper and lower values from Days and get interpolated Yield (eg. for my static 28 days in this particular case I'd have to use these two rows: KZK1KY011222 MKM012_0122 17 2.1602 and KZW1KM063603 NTK182_1360 32 2.1419).
If there are no beginning or ending values, which mostly applicable to 28 and 3640, then I want to take the Yield from the closest Days.

I was thinking about doing it in Excel/VBA, but would also like to see R solution. Any help would be greatly appreciated.

Expected output for Yield is calculated according to this formula:

So in this case for 28 Days I'd have Yield (28 - 17)*(2.1419-2.1602)/(32-17)+2.1602 = 2.1468

Onyambu · Accepted Answer

 A=read.table(text="ID            Code    Days  Yield
                KZW1KM093436    NTK273_1343 4   2.1781
                KZW1KD913708    NTK091_1370 11  2.1683
                KZK1KY011222    MKM012_0122 17  2.1602
                KZW1KM063603    NTK182_1360 32  2.1419
                KZW1KM093477    NTK273_1347 32  2.1419
                KZW1KD913740    NTK091_1374 39  2.1342
                KZW1KM063629    NTK182_1362 46  2.1269
                KZW1KM093501    NTK273_1350 53  2.1202",header=T)

 k=c(28L, 91L, 182L, 273L, 364L, 730L, 1825L, 2555L, 3640L)

Now first find the interval where the value to be predicted lies

 funfun=function(x){
             if(x%in%A$Days)return(A[A$Days==x,3:4])
             v=findInterval(x,A$Days);na.omit(A[c(v,v+1),3:4])}

eg:

 funfun(28)
  Days  Yield
3   17 2.1602
4   32 2.1419
funfun(34)
  Days  Yield
5   32 2.1419
6   39 2.1342
funfun(4)
  Days  Yield
1    4 2.1781
funfun(0)
  Days  Yield
1    4 2.1781
funfun(345)
  Days  Yield
8   53 2.1202
 funfun(11)
  Days  Yield
2   11 2.1683

Next we write a function that computes the predicted value:

 funfun1=function(i){
   s=funfun(i)
   if(nrow(s)==1)return(s$Yield)
   (i-s$Days[1])*Reduce("/",rev(sapply(s,diff)))+s$Yield[1]
 }

sapply(k,funfun1)
[1] 2.14678 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020

Most of the values given in K lie outside the range of the days we have thus only taking the last value. What if we had different values? say:

  k=c(4,11,17,18,20,33,48,50)
  sapply(k,funfun1)
  [1] 2.178100 2.168300 2.160200 2.158980 2.156540 2.140800 2.124986 2.123071

We see for those values already in the data, we get their yields. Those above the data, we obtain the last yield in the data and those in between we estimate as required.

Hope this will help

Interpolate between two dynamic days

Answers (1)

Related Questions