Reputation: 1740
I have a dataframe with 22 variables (I can't post it here since the data is confidential). I need to remove all cases that have all NAs in variables 4 through 22. So, if a certain case has at least one non-NA in variables 4 through 22, I have to keep it. It's not relevant if there is or is not a NA value in the first three variables, but I also need to keep those three variables in my dataframe.
I'm trying this code:
df<-df[rowSums(is.na(df[,c(4:22)]))==19]
But I'm getting an error:
Error in `[.data.frame`(df, rowSums(is.na(df[, c(4:22)])) == 19) :
undefined columns selected
Does anyone have any suggestions on what to do? Thanks!
Upvotes: 1
Views: 465
Reputation: 20095
You are very much near to solution. You can try the colSums
on columns 4:22
. Also, include 3 TRUE
to keep first 3 columns selected.
df[c(rep(TRUE,3),colSums(is.na(df[4:22])) != nrow(df) )]
If OP wants to exclude rows
with all NA
values in column 4:22 then solution could be:
df[rowSums(is.na(df[,c(4:22)])) != 19, ]
Applying above solution to a dataframe with 8 columns as:
df[c(rep(TRUE,3),colSums(is.na(df[4:8])) != nrow(df) )]
# ID Status V1 V2 V3 V4 V5
# 1 1 0 1 0 0 0 1
# 2 1 0 1 0 0 0 1
# 3 1 1 1 1 1 1 1
# 4 2 0 2 0 0 0 2
# 5 2 1 2 1 1 1 2
# 6 2 NA 2 NA NA NA 2
# 7 3 0 3 0 0 0 3
# 8 3 1 3 1 1 1 3
# 9 3 NA 3 NA NA NA 3
# 10 3 NA 3 NA NA NA 3
Sample data.frame
ID Status V1 V2 V3 V4 V5 V6
1 1 0 1 0 0 0 1 NA
2 1 0 1 0 0 0 1 NA
3 1 1 1 1 1 1 1 NA
4 2 0 2 0 0 0 2 NA
5 2 1 2 1 1 1 2 NA
6 2 NA 2 NA NA NA 2 NA
7 3 0 3 0 0 0 3 NA
8 3 1 3 1 1 1 3 NA
9 3 NA 3 NA NA NA 3 NA
10 3 NA 3 NA NA NA 3 NA
Upvotes: 2