Reputation: 1866
I have an example calculation in Excel, which i need to convert into R code for a much larger data set.
My data should look like this:
time value cum_value floor scaled_time
0 0 0 0 0
1 1 1 1 1
2 0.975 1.975 1 1
3 0.95 2.925 3 2.038961039
4 0.925 3.85 4 3.098982099
5 0.9 4.75 5 4.185278042
6 0.875 5.625 6 5.302030016
7 0.85 6.475 7 6.453196107
Where the 'scaled time' column has been calculated in Excel using this type of formula (example shown is for row 6):
=scaled_time5+((floor6-floor5)/((cum_value6-floor5)/(time6-scaled_time5)))
Since this refers to cells in previous rows, i am having trouble coding this in R.
This is what i have so far (using data.table
shift
function:
DF$Scaled_Time=shift(DF$Scaled_Time, 1L, "lag")+
((DF$Floor-shift(DF$Floor,1L,"lag"))/
((DF$Cum_Value-shift(DF$Floor,1L,"lag"))/
(DF$Time-shift(DF$Scaled_Time, 1L, "lag"))))
This doesn't work, and comes up with this error:
Error in `$<-.data.frame`(`*tmp*`, "Scaled_Time", value = numeric(0)) :
replacement has 0 rows, DF has 2246400
In addition: Warning messages:
1: In shift(DF$Floor, 1L, "lag") : NAs introduced by coercion
2: In shift(DF$Floor, 1L, "lag") : NAs introduced by coercion
Upvotes: 1
Views: 408
Reputation: 1721
using dplyr you can get previous value by lag:
library(dplyr)
I recreated the dataframe:
vv <- data.frame(time=c(3,4,5,6,7),
value=c(0.95,0.925,0.9,0.875,0.85),
cum_value=c(3.925,4.85,5.75,6.625,7.475),
floor=c(3,4,5,6,7),
scaled_time=c(2.038961039,3.098982099,4.185278042,5.302030016,6.453196107))
Here is a simple calculation, you can use yours instead:
time+((floor value- previous floor value)/(cum_value-previous floor value)) will be written as:
> vv %>% mutate(V4=time+((floor-lag(floor,1))/(cum_value-lag(floor,1))))
time value cum_value floor scaled_time V4
1 3 0.950 3.925 3 2.038961 NA
2 4 0.925 4.850 4 3.098982 4.540541
3 5 0.900 5.750 5 4.185278 5.571429
4 6 0.875 6.625 6 5.302030 6.615385
5 7 0.850 7.475 7 6.453196 7.677966
If I didn't miss any parentheses in your original formula, it should be like that:
vv %>% mutate(V=lag(scaled_time,1)+
((floor-lag(floor,1))/
((cum_value-lag(floor,1))/(time-lag(scaled_time,1)))
)
)
However, it turns out that scaled_time should be the output, and the first row will be initialized with 0 (not calculated). So one of the options is for loop.
EDIT: For Loop Solution
Although go for loops as a last option, this is a quick solution in case of small dataframes:
vv$scaled_time <- 0
for (i in 2: nrow(vv))
{
vv$scaled_time[i]= vv$scaled_time[i-1]+
((vv$floor[i]-vv$floor[i-1])/((vv$cum_value[i]-vv$floor[i-1])/(vv$time[i]-vv$scaled_time[i-1])))
}
Upvotes: 1
Reputation: 2621
You can use the shift
function from data.table
.
df$result = 2.038961
df[, result := shift(result)+((floor-shift(floor))/((cum_value-shift(floor))/(time-shift(result)))) ]
Upvotes: 2