Prasad
Prasad

Reputation: 67

R- Replace all values in rows of dataframe after first NA by NA

I have a dataframe of 3500 observations and 278 variables. For each row going from the first column, I want to replace all values occurring after the first NA by NAs. For instance, I want to go from a dataframe like so:

X1 X2 X3 X4 X5
 1  3 NA  6  9
 1 NA  4  6 18
 6  7 NA  3  1 
10  1  2 NA  2 

To something like

X1 X2 X3 X4 X5
 1  3 NA NA NA
 1 NA NA NA NA
 6  7 NA NA NA 
10  1  2 NA NA   

I tried using the following nested for loop, but it is not terminating:

for(i in 2:3500){
 firstna <- min(which(is.na(df[i,])))
 df[i, firstna:278] <- NA
}

Is there a more efficient way to do this? Thanks in advance.

Upvotes: 2

Views: 1271

Answers (3)

NGaffney
NGaffney

Reputation: 1532

I did this using the cumany function from the dplyr package, which returns TRUE for each element after the condition is met.

df <- read.table(text = "X1 X2 X3 X4 X5
                         1  3 NA  6  9
                         1 NA  4  6 18
                         6  7 NA  3  1 
                         10  1  2 NA  2 ",
                 header = T)

library(plyr)
library(dplyr)

na_row_replace <- function(x){
  x[which(cumany(is.na(x)))] <- NA
  return(x)
}

adply(df, 1, na_row_replace)

Upvotes: 1

akrun
akrun

Reputation: 886938

We can use rowCumsums from library(matrixStats)

library(matrixStats)
d*NA^rowCumsums(+(is.na(d)))
#  X1 X2 X3 X4 X5
#1  1  3 NA NA NA
#2  1 NA NA NA NA
#3  6  7 NA NA NA
#4 10  1  2 NA NA

Or a base R option is

d*NA^do.call(cbind,Reduce(`+`,lapply(d, is.na), accumulate=TRUE))

Upvotes: 3

Jota
Jota

Reputation: 17611

You could do something like this:

# sample data
mat <- matrix(1, 10, 10)
set.seed(231)
mat[sample(100, 7)] <- NA

You can use apply with cumsum and is.na to keep track of where NAs need to be placed (i.e. places across the row where the cumulative sum of NAs is greater than 0). Then, use those locations to assign NAs to the original structure in the appropriate places.

mat[t(apply(is.na(mat), 1, cumsum)) > 0 ] <- NA
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    1    1   NA   NA   NA    NA
# [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [3,]    1    1    1    1    1    1    1    1    1     1
# [4,]    1    1    1    1    1    1    1    1    1     1
# [5,]    1    1    1   NA   NA   NA   NA   NA   NA    NA
# [6,]    1    1    1    1    1    1    1    1    1     1
# [7,]    1   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [8,]    1    1    1    1    1    1    1    1    1     1
# [9,]    1    1    1    1    1    1    1    1    1     1
#[10,]    1    1   NA   NA   NA   NA   NA   NA   NA    NA

Works the fine with data frames. Using the provided example data:

d<-read.table(text="
X1 X2 X3 X4 X5
 1  3 NA  6  9
 1 NA  4  6 18
 6  7 NA  3  1 
10  1  2 NA  2 ", header=TRUE)

d[t(apply(is.na(d), 1, cumsum)) > 0 ] <- NA
#  X1 X2 X3 X4 X5
#1  1  3 NA NA NA
#2  1 NA NA NA NA
#3  6  7 NA NA NA
#4 10  1  2 NA NA

Upvotes: 8

Related Questions