Reputation: 23
I have two columns: Time, and Value. Time is continuous and does not have any blanks. Value, however, comprise data that had been sampled at random points and thus have random lengths of data gaps in between values.
Here's a very simple sample dataset:
df <-data.frame(Time=1:10, Value=c("2", NA, NA, NA, "6", NA, NA, "7", NA, "3"))
I would like to create a third column, "Estimate". Under this new column:
For example, for Time 2
Essentially I'm just making an equally-weighted transition from the first value to the next value. I'm not concerned with anything before the first value or last value (if there were NAs before Time 1 or after Time 10).
THE QUESTION:
Being very much a newbie, I'm not quite sure how to best approach coding for the Estimate column, when Value is blank. I've tried generating a vector of row numbers for rows with actual values, thinking I could use that as index reference. I then tried to do a loop where it would take row A and row B (from the vector of row numbers), calculate the increment, then add the increment to the last cell. However, I couldn't figure out how to make both A & B increase by 1 at the same time (such that it did a "rolling window" down my vector of row numbers). I also suspect this is not a good way of approaching this problem...but don't know what my options are.
Any guidance and pointing in the right direction would be greatly appreciated!
Upvotes: 2
Views: 1150
Reputation: 37641
Since you are treating the values in df$Value as numbers, I assume that you wanted numbers, not strings.
df <-data.frame(Time=1:10, Value=c(2, NA, NA, NA, 6, NA, NA, 7, NA, 3))
What you are asking for is linear interpolation, which is provided by the R function approxfun
.
AF = approxfun(df[complete.cases(df),1], df[complete.cases(df),2])
ifelse(is.na(df$Value), AF(df$Time), df$Value)
[1] 2.000000 3.000000 4.000000 5.000000 6.000000 6.333333 6.666667 7.000000
[9] 5.000000 3.000000
Upvotes: 2