Reputation: 5227
I have a matrix, where rows can have NA's for all columns. I want to replace these NA rows with previous row's non-NA value and K-th column.
For example, this matrix:
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] NA NA
[6,] NA NA
[7,] NA NA
[8,] 6 7
[9,] 7 8
[10,] 8 9
Must be transformed to this non-NA matrix, where we use 2-th column for replacement:
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] 3 3
[6,] 3 3
[7,] 3 3
[8,] 6 7
[9,] 7 8
[10,] 8 9
I wrote a function for this, but using loop:
# replaces rows which contains all NAs with non-NA values from previous row and K-th column
na.replace <- function(x, k) {
cols <- ncol(x)
for (i in 2:nrow(x)) {
if (sum(is.na(x[i - 1, ])) == 0 && sum(is.na(x[i, ])) == cols) {
x[i, ] <- x[i - 1 , k]
}
}
x
}
Seems this function works correct, but I want to avoid these loops. Can anyone advice, how I can do this replacement without using loops?
UPDATE
agstudy suggested it's own vectorized non-loop solution:
na.replace <- function(mat, k){
idx <- which(rowSums(is.na(mat)) == ncol(mat))
mat[idx,] <- mat[ifelse(idx > 1, idx-1, 1), k]
mat
}
But this solution returns different and wrong results, comparing to my solution with loops. Why this happens? Theoretically loop and non-loop solutions are identical.
Upvotes: 0
Views: 4400
Reputation:
I'd use the na.locf
function in a loop that simply uses the next column to generate a vector of replacement values. However, this may not be very efficient if your matrix is large.
library(zoo)
m <- cbind(
c(NA, NA, 1, 2, NA, 4, NA, 6, 7, 8),
c(NA, NA, 2, 3, NA, 5, NA, 7, 8, 9)
)
m[, ncol(m)] <- na.locf(m[, ncol(m)], na.rm=FALSE)
for (i in seq(ncol(m)-1, 1)) {
replacement_values = na.locf(m[, i+1], na.rm=FALSE)
m[is.na(m[, i]), i] <- replacement_values[is.na(m[, i])]
}
Upvotes: 2
Reputation: 333
Try this function. We can replace NA's at any position in a vector.
NA.replace <-function(x) {
i <- cumprod(is.na(x))
x[!!i] <- x[which.min(i)]
if (length(x) > 0L) {
non.na.idx <- which(!is.na(x))
if (is.na(x[1L])) {
non.na.idx <- c(1L, non.na.idx)
}
rep.int(x[non.na.idx], diff(c(non.na.idx, length(x) + 1L)))
}
}
NA.replace(c(NA, 1, 2, NA, NA, 3, NA, NA, 4, NA))
# [1] 1 1 2 2 2 3 3 3 4 4
Upvotes: 5
Reputation: 5227
Finally I realized my own vectorized version. It returns expected output:
na.replace <- function(x, k) {
isNA <- is.na(x[, k])
x[isNA, ] <- na.locf(x[, k], na.rm = F)[isNA]
x
}
UPDATE
Better solution, without any packages
na.lomf <- function(x) {
if (length(x) > 0L) {
non.na.idx <- which(!is.na(x))
if (is.na(x[1L])) {
non.na.idx <- c(1L, non.na.idx)
}
rep.int(x[non.na.idx], diff(c(non.na.idx, length(x) + 1L)))
}
}
na.lomf(c(NA, 1, 2, NA, NA, 3, NA, NA, 4, NA))
# [1] NA 1 2 2 2 3 3 3 4 4
Upvotes: 0
Reputation: 121568
Here a new vectorized solution:
idx <- which(rowSums(is.na(mat)) == ncol(mat))
mat[idx,1:2]= mat[ifelse(idx>1,idx-1,1),2]
X..1. X..2.
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] 3 3
[6,] 4 5
[7,] 5 5
[8,] 6 7
[9,] 7 8
[10,] 8 9
You can wrap this in a function :
function(mat,k){
idx <- which(rowSums(is.na(mat)) == ncol(mat))
mat[idx,] <- mat[ifelse(idx>1,idx-1,1),k]
}
Upvotes: 1