How to replace values in a column in a data.frame not equal to randomly selected values with NAs?

Question

I randomly selected 30 values from variable a in the df data.frame.

set.seed(123) 
date <- as.Date(seq(as.Date("2003-01-01"), as.Date("2003-05-31"), by = 1), format="%Y-%m-%d") 
a    <- runif(151, 0.005, 2.3) 
df   <- data.frame(date, a) 
#select 30 random samples
rans <-sample(length(df$a), 30)

I tried this and it replaced all values in df$a that are equal to rans with NAs.

df[,2][rans] <- NA

But I want to replace all values in df$a that are NOT EQUAL to rans with NAs so I tried the following but it didn't work

df[,2][!rans] <- NA            #didn't work           
df[,2][!rans %in% df] <- NA    #replaced all values in df$a with NAs

Any suggestions how to do that?

akrun · Accepted Answer

It may not be better to use negative index, instead use setdiff. We get the row index of those the sequence of rows that are not found in 'rans' by using the setdiff, and then assign the 2nd column values corresponding to those rows as NA.

df[setdiff(seq_len(nrow(df)), rans),2] <- NA

Or instead of setdiff, we use %in% to get a logical vector of common elements and then negate (!) so that TRUE becomes FALSE and FALSE as TRUE. Assign the 2nd column values that corresponds to the rows as NA.

df[!(seq_len(nrow(df)) %in% rans), 2] <- NA

If we use data.table, we convert the 'data.frame' to 'data.table' (setDT(df)), and assign 'a' to 'NA' for those row that doesn't satisfy the condition (as mentioned above).

library(data.table)
setDT(df)[!(1:.N %in% rans), a:= NA]

Why the OP's code didn't work?

First option
```
df[,2][!rans] <- NA
```
didn't work because
```
!rans
#[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
```
gives all FALSE values. The negation operator (!) converts whichever value that is '0' in the vector/column to TRUE and all others to FALSE. As the 'rans' did not have any 0 value, all of them got converted to FALSE. So, by assigning based on the logical index of all FALSE is not going to replace any corresponding value in the 2nd column to NA.
Second option
```
df[,2][!rans %in% df] <- NA  
```
'df' is a data.frame and the values in the columns don't match with the values in 'rans'. So it will be all FALSE again.
```
rans %in% df
#[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
```
By negating the above, all the elements are now TRUE, so it subsets all the values in 2nd column, and by assigning those elements to NA, we get a column with full NA values.

How to replace values in a column in a data.frame not equal to randomly selected values with NAs?

Answers (2)

Related Questions