Reputation: 67
I have a dataframe of 3500 observations and 278 variables. For each row going from the first column, I want to replace all values occurring after the first NA by NAs. For instance, I want to go from a dataframe like so:
X1 X2 X3 X4 X5
1 3 NA 6 9
1 NA 4 6 18
6 7 NA 3 1
10 1 2 NA 2
To something like
X1 X2 X3 X4 X5
1 3 NA NA NA
1 NA NA NA NA
6 7 NA NA NA
10 1 2 NA NA
I tried using the following nested for loop, but it is not terminating:
for(i in 2:3500){
firstna <- min(which(is.na(df[i,])))
df[i, firstna:278] <- NA
}
Is there a more efficient way to do this? Thanks in advance.
Upvotes: 2
Views: 1271
Reputation: 1532
I did this using the cumany
function from the dplyr
package, which returns TRUE
for each element after the condition is met.
df <- read.table(text = "X1 X2 X3 X4 X5
1 3 NA 6 9
1 NA 4 6 18
6 7 NA 3 1
10 1 2 NA 2 ",
header = T)
library(plyr)
library(dplyr)
na_row_replace <- function(x){
x[which(cumany(is.na(x)))] <- NA
return(x)
}
adply(df, 1, na_row_replace)
Upvotes: 1
Reputation: 886938
We can use rowCumsums
from library(matrixStats)
library(matrixStats)
d*NA^rowCumsums(+(is.na(d)))
# X1 X2 X3 X4 X5
#1 1 3 NA NA NA
#2 1 NA NA NA NA
#3 6 7 NA NA NA
#4 10 1 2 NA NA
Or a base R
option is
d*NA^do.call(cbind,Reduce(`+`,lapply(d, is.na), accumulate=TRUE))
Upvotes: 3
Reputation: 17611
You could do something like this:
# sample data
mat <- matrix(1, 10, 10)
set.seed(231)
mat[sample(100, 7)] <- NA
You can use apply
with cumsum
and is.na
to keep track of where NAs need to be placed (i.e. places across the row where the cumulative sum of NAs is greater than 0). Then, use those locations to assign NAs to the original structure in the appropriate places.
mat[t(apply(is.na(mat), 1, cumsum)) > 0 ] <- NA
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 1 1 1 1 1 NA NA NA NA
# [2,] NA NA NA NA NA NA NA NA NA NA
# [3,] 1 1 1 1 1 1 1 1 1 1
# [4,] 1 1 1 1 1 1 1 1 1 1
# [5,] 1 1 1 NA NA NA NA NA NA NA
# [6,] 1 1 1 1 1 1 1 1 1 1
# [7,] 1 NA NA NA NA NA NA NA NA NA
# [8,] 1 1 1 1 1 1 1 1 1 1
# [9,] 1 1 1 1 1 1 1 1 1 1
#[10,] 1 1 NA NA NA NA NA NA NA NA
Works the fine with data frames. Using the provided example data:
d<-read.table(text="
X1 X2 X3 X4 X5
1 3 NA 6 9
1 NA 4 6 18
6 7 NA 3 1
10 1 2 NA 2 ", header=TRUE)
d[t(apply(is.na(d), 1, cumsum)) > 0 ] <- NA
# X1 X2 X3 X4 X5
#1 1 3 NA NA NA
#2 1 NA NA NA NA
#3 6 7 NA NA NA
#4 10 1 2 NA NA
Upvotes: 8