Y. Z.
Y. Z.

Reputation: 377

Is there any way to replace a missing value based on another columns' value to match the column name

I have a dataset:

    a day day.1.time day.2.time day.3.time day.4.time day.5.time
1  NA   2          4          5          7         10          4
2  NA   5          4          1          1          6         NA
3  NA   3          7          9          6          7          4
4  NA   3          6          8          8          4          5
5  NA   3          5          2          4          5          6
6  NA   3         87          3          2          1         78
7  NA   1         NA          7          5          9         54
8  NA   5          6          6          3          2          3
9  NA   2          5         10          9          8          3
10 NA   3          9          4         10          3          3

I am trying to use the day column value to match with the day.x.time column to replace the missing value in column a. For instance, in the first row, the first value in the day column is 2, then we should use day.2.time value 5 to replace the first value in column a.

If the day.x.time value is missing, we should use -1 day or +1 day to replace the missing in column a. For instance, in the second row, the day column shows 5, so we should use the value in day.5.time column, but it's also a missing value. In this case, we should use the value in day.4.time column to replace the missing value in column a.

You can use dat = data.frame(a = rep(NA,10), day = c(2,5,3,3,3,3,1,5,2,3), day.1.time = c(4,4,7,6,5,87,NA,6,5,9), day.2.time = sample(10), day.3.time = sample(10), day.4.time = sample(10), day.5.time = c(4,NA,4,5,6,78,54,3,3,3)) to generate the sample data.

I have tried grep(paste0("^day."dat$day,".time$", names(dat)) to match with the column but my code isn't matching in every row, so any help would be appreciated!

Upvotes: 1

Views: 89

Answers (2)

jay.sf
jay.sf

Reputation: 73612

Using sapply to loop over the rows and subset by day[i] + 2 column.

res <- transform(dat, a=sapply(1:nrow(dat), function(i) dat[i, dat$day[i] + 2]))
res
#     a day day.1.time day.2.time day.3.time day.4.time day.5.time
# 1   5   2          4          5          7         10          4
# 2  NA   5          4          1          1          6         NA
# 3   6   3          7          9          6          7          4
# 4   8   3          6          8          8          4          5
# 5   4   3          5          2          4          5          6
# 6   2   3         87          3          2          1         78
# 7  NA   1         NA          7          5          9         54
# 8   3   5          6          6          3          2          3
# 9  10   2          5         10          9          8          3
# 10 10   3          9          4         10          3          3

Edit

The +/-2 days would require a decision rule, what to chose, if day is NA, but none of day - 1 and day + 1 is NA and both have the same values.

Here a solution that goes from day backwards and takes the first non-NA. If it is day one, as it's the case in row 7, we get NA.

res <- transform(dat, a=sapply(1:nrow(dat), function(i) {
  days <- dat[i, -(1:2)]
  day.value <- days[dat$day[i]]
  if (is.na(day.value)) {
    day.value <- tail(na.omit(unlist(days[1:dat$day[i]])), 1)
    if (length(day.value) == 0) day.value <- NA
  }
  return(day.value)
}))
res
#     a day day.1.time day.2.time day.3.time day.4.time day.5.time
# 1  10   2          4         10          1          2          4
# 2  10   5          4          1          3         10         NA
# 3   2   3          7          7          2          7          4
# 4   6   3          6          2          6          6          5
# 5  10   3          5          9         10          5          6
# 6   8   3         87          6          8          4         78
# 7  NA   1         NA          3          7          1         54
# 8   3   5          6          4          4          9          3
# 9   8   2          5          8          5          8          3
# 10  9   3          9          5          9          3          3

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389235

Here is one way to do this.

The first part is easy to match day column with the corresponding day.x.time column. We can do this using matrix subsetting.

cols <- grep('day\\.\\d+\\.time', names(dat))
dat$a <- dat[cols][cbind(1:nrow(dat), dat$day)]
dat
#    a day day.1.time day.2.time day.3.time day.4.time day.5.time
#1   3   2          4          3          3          3          4
#2  NA   5          4          4         10          2         NA
#3   1   3          7          8          1          8          4
#4   4   3          6          6          4          5          5
#5   6   3          5         10          6          7          6
#6   8   3         87          5          8          9         78
#7  NA   1         NA          1          7         10         54
#8   3   5          6          7          9          1          3
#9   2   2          5          2          5          6          3
#10  2   3          9          9          2          4          3

To fill values where day.x.time column is NA we can select the closest non-NA value in that row.

inds <- which(is.na(dat$a))
dat$a[inds] <- mapply(function(x, y) 
         na.omit(unlist(dat[x, cols[order(abs(y- seq_along(cols)))]])[1:4])[1],
              inds, dat$day[inds])

dat
#   a day day.1.time day.2.time day.3.time day.4.time day.5.time
#1  3   2          4          3          3          3          4
#2  2   5          4          4         10          2         NA
#3  1   3          7          8          1          8          4
#4  4   3          6          6          4          5          5
#5  6   3          5         10          6          7          6
#6  8   3         87          5          8          9         78
#7  1   1         NA          1          7         10         54
#8  3   5          6          7          9          1          3
#9  2   2          5          2          5          6          3
#10 2   3          9          9          2          4          3

Upvotes: 1

Related Questions