Reputation: 318
I am attempting to replace NA values in my data frame based on the logical return of one of the columns in the data frame.
#Creating random example data frame
a <- rbinom(1000,1,.5)
b <- rbinom(1000,1,.75)
c <- rbinom(1000,1,.25)
d <- rbinom(1000,1,.5)
e <- rbinom(1000,1,.5) # Will be the logical column
df <- cbind(a,b,c,d)
for(i in 1:1000){
if(sum(df[i,1:4]) >2){
df[i,1:4] <- NA
}
}
# randomly replacing some of the NA to represent the observation data
df[sample(1:length(df), 100, replace=F)] <- 1
df <- cbind(df, e)
I am attempting to fill in the NA
s with 0 when e == 1
while still retaining the random 1s I placed in the the other 4 columns (especially those where the rest of the values are NA).
I've tried creating loops like:
for(i in 1:nrow(df)){
if(df[,'e']==1){
df[i,is.na(df[i,1:4])] <- 0
}
}
however that clears both my logical column and my observation data.
The data frame that I want to apply this to is large (2.8 million rows X 23 col) containing metadata and observation data so something that takes speed into account would be great.
Upvotes: 1
Views: 98
Reputation: 887521
We can do this with data.table
library(data.table)
df1 <- as.data.frame(df)
setDT(df1)
for(j in 1:4){
set(df1, i = which(df1[['e']]==1 & is.na(df1[[j]])), j = j, value = 0)
}
It would be more efficient as we are using set
. Based on the help page of set
(?set
) overhead of [.data.table
is avoided by calling it.
As @thelatemail mentioned a compact base R
option would be
df[,1:4][df[,"e"]==1 & is.na(df[,1:4])] <- 0
If the matrix is very big, the logical matrix would be big as well and that could potentially create memory-related issues.
Upvotes: 1