Reputation: 499
I have a data set with some missing values in a sequence:
Seq<-c(1,2,3,4,6,7,10,11,12,18,19,20)
Data<-c(3,4,5,4,3,2,1,2,3,5,4,3)
DF<-data.frame(Seq, Data)
I'd like to add rows to this data set approximating where I have missing values, and filling in the data with NA. So any time where I have a gap larger than 2, I add a NA row (or multiple rows if the gap is large). The result would look something like this:
NewSeq<-c(1,2,3,4,6,7,8.5,10,11,12,14,16,18,19,20)
NewData<-c(3,4,5,4,3,2,NA,1,2,3,NA,NA,18,19,20)
NewDF<-data.frame(NewSeq,NewData)
So I ignore when the gap is only < 2, but I add an NA row anytime there is a gap > 2. If there is still a > 2 gap after adding an NA row, I add another until the gap is filled.
Upvotes: 1
Views: 449
Reputation: 1095
Seem to works for your example but not sure how it will perform on data I haven't seen. You'll need to adjust the intervals in the ifelse statement depending on how you want to account for different difference intervals.
Seq<-c(1,2,3,4,6,7,10,11,12,18,19,20)
Data<-c(3,4,5,4,3,2,1,2,3,5,4,3)
DF<-data.frame(Seq, Data)
diffs <- diff(Seq)
inds <- which(diffs > 2)
new.vals <- sapply(inds, function(x)
if(diffs[x] %% 2 != 0){
seq(Seq[x]+1.5, Seq[x+1]-1.5,1.5)
}else{
seq(Seq[x]+2, Seq[x+1]-2,2)
})
add.length <- unlist(lapply(new.vals, function(x) length(x)))
Seq.new <- c(Seq, unlist(new.vals))
id <- c(seq_along(Seq),
rep(inds+0.5,add.length))
Seq.new <- Seq.new[order(id)]
Data.new <- c(Data, rep(NA, sum(add.length)))
id <- c(seq_along(Seq),
rep(inds+0.5,add.length))
Data.new <- Data.new[order(id)]
NewDF <- data.frame(Seq.new, Data.new)
Upvotes: 1
Reputation: 2826
Not very elegant, but this is how I would do it:
Seq<-c(1,2,3,4,6,7,10,11,12,18,19,20)
Data<-c(3,4,5,4,3,2,1,2,3,5,4,3)
DF<-data.frame(Seq, Data)
first <- DF$Seq
second <- DF$Data
for(i in length(first):2) {
gap <- first[i] - first[i - 1]
if(gap > 2) {
steps <- ifelse(gap %% 2 == 1, gap %/% 2, (gap %/% 2) -1)
new_values_gap <- gap / (steps + 1)
new_values <- vector('numeric')
for(j in 1:steps) {
new_values <- c(new_values, first[i - 1] + j * new_values_gap)
}
first <- c(first[1:i - 1], new_values, first[i:length(first)])
second <- c(second[1:i - 1], rep(NA, length(new_values)), second[i:length(second)])
}
}
NewDF <- data.frame(NewSeq = first, NewData = second)
> NewDF
## NewSeq NewData
## 1 1.0 3
## 2 2.0 4
## 3 3.0 5
## 4 4.0 4
## 5 6.0 3
## 6 7.0 2
## 7 8.5 NA
## 8 10.0 1
## 9 11.0 2
## 10 12.0 3
## 11 14.0 NA
## 12 16.0 NA
## 13 18.0 5
## 14 19.0 4
## 15 20.0 3
Upvotes: 1