Reputation: 3026
We have a file that is updated every week and its structure is as follows:
ID Code Days Yield
KZW1KM093436 NTK273_1343 4 2.1781
KZW1KD913708 NTK091_1370 11 2.1683
KZK1KY011222 MKM012_0122 17 2.1602
KZW1KM063603 NTK182_1360 32 2.1419
KZW1KM093477 NTK273_1347 32 2.1419
KZW1KD913740 NTK091_1374 39 2.1342
KZW1KM063629 NTK182_1362 46 2.1269
KZW1KM093501 NTK273_1350 53 2.1202
I have the following static data points (Days) for which I need Yields
on a weekly basis:
28
91
182
273
364
730
1825
2555
3640
When the file gets updated, the structure of the data also changes (sorted by Days
). There are several possible scenarios:
Days
column of the data is equal to any of the
values in my Static
data, then I'd like to take the corresponding
value from Yield
column.Days
and get interpolated Yield
(eg. for my static 28
days in
this particular case I'd have to use these two rows: KZK1KY011222 MKM012_0122 17 2.1602
and KZW1KM063603 NTK182_1360 32 2.1419
).28
and 3640
, then I want to take the Yield
from the closest
Days
.I was thinking about doing it in Excel/VBA, but would also like to see R solution. Any help would be greatly appreciated.
Expected output for Yield
is calculated according to this formula:
So in this case for 28
Days I'd have Yield
(28 - 17)*(2.1419-2.1602)/(32-17)+2.1602 = 2.1468
Upvotes: 0
Views: 610
Reputation: 79208
A=read.table(text="ID Code Days Yield
KZW1KM093436 NTK273_1343 4 2.1781
KZW1KD913708 NTK091_1370 11 2.1683
KZK1KY011222 MKM012_0122 17 2.1602
KZW1KM063603 NTK182_1360 32 2.1419
KZW1KM093477 NTK273_1347 32 2.1419
KZW1KD913740 NTK091_1374 39 2.1342
KZW1KM063629 NTK182_1362 46 2.1269
KZW1KM093501 NTK273_1350 53 2.1202",header=T)
k=c(28L, 91L, 182L, 273L, 364L, 730L, 1825L, 2555L, 3640L)
Now first find the interval where the value to be predicted lies
funfun=function(x){
if(x%in%A$Days)return(A[A$Days==x,3:4])
v=findInterval(x,A$Days);na.omit(A[c(v,v+1),3:4])}
eg:
funfun(28)
Days Yield
3 17 2.1602
4 32 2.1419
funfun(34)
Days Yield
5 32 2.1419
6 39 2.1342
funfun(4)
Days Yield
1 4 2.1781
funfun(0)
Days Yield
1 4 2.1781
funfun(345)
Days Yield
8 53 2.1202
funfun(11)
Days Yield
2 11 2.1683
Next we write a function that computes the predicted value:
funfun1=function(i){
s=funfun(i)
if(nrow(s)==1)return(s$Yield)
(i-s$Days[1])*Reduce("/",rev(sapply(s,diff)))+s$Yield[1]
}
sapply(k,funfun1)
[1] 2.14678 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020 2.12020
Most of the values given in K lie outside the range of the days we have thus only taking the last value. What if we had different values? say:
k=c(4,11,17,18,20,33,48,50)
sapply(k,funfun1)
[1] 2.178100 2.168300 2.160200 2.158980 2.156540 2.140800 2.124986 2.123071
We see for those values already in the data, we get their yields. Those above the data, we obtain the last yield in the data and those in between we estimate as required.
Hope this will help
Upvotes: 2