J. Doe
J. Doe

Reputation: 1740

R - How to exclude cases based on the number of NA values in certain variables

I have a dataframe with 22 variables (I can't post it here since the data is confidential). I need to remove all cases that have all NAs in variables 4 through 22. So, if a certain case has at least one non-NA in variables 4 through 22, I have to keep it. It's not relevant if there is or is not a NA value in the first three variables, but I also need to keep those three variables in my dataframe.

I'm trying this code:

df<-df[rowSums(is.na(df[,c(4:22)]))==19]

But I'm getting an error:

Error in `[.data.frame`(df, rowSums(is.na(df[, c(4:22)])) == 19) : 
undefined columns selected

Does anyone have any suggestions on what to do? Thanks!

Upvotes: 1

Views: 465

Answers (1)

MKR
MKR

Reputation: 20095

You are very much near to solution. You can try the colSums on columns 4:22. Also, include 3 TRUE to keep first 3 columns selected.

df[c(rep(TRUE,3),colSums(is.na(df[4:22])) != nrow(df) )]

If OP wants to exclude rows with all NA values in column 4:22 then solution could be:

df[rowSums(is.na(df[,c(4:22)])) != 19, ]

Applying above solution to a dataframe with 8 columns as:

df[c(rep(TRUE,3),colSums(is.na(df[4:8])) != nrow(df) )]
#    ID Status V1 V2 V3 V4 V5
# 1   1      0  1  0  0  0  1
# 2   1      0  1  0  0  0  1
# 3   1      1  1  1  1  1  1
# 4   2      0  2  0  0  0  2
# 5   2      1  2  1  1  1  2
# 6   2     NA  2 NA NA NA  2
# 7   3      0  3  0  0  0  3
# 8   3      1  3  1  1  1  3
# 9   3     NA  3 NA NA NA  3
# 10  3     NA  3 NA NA NA  3

Sample data.frame

   ID Status V1 V2 V3 V4 V5 V6
1   1      0  1  0  0  0  1 NA
2   1      0  1  0  0  0  1 NA
3   1      1  1  1  1  1  1 NA
4   2      0  2  0  0  0  2 NA
5   2      1  2  1  1  1  2 NA
6   2     NA  2 NA NA NA  2 NA
7   3      0  3  0  0  0  3 NA
8   3      1  3  1  1  1  3 NA
9   3     NA  3 NA NA NA  3 NA
10  3     NA  3 NA NA NA  3 NA

Upvotes: 2

Related Questions