M.L.
M.L.

Reputation: 129

Remove rows with missing data in select columns, only if they don't have missing data in all columns (preferably use complete.cases)

So I'm trying to remove rows that have missing data in some columns, but not those that have missing data in all columns.

using rowSums alongside !is.na() gave me 1000's of rows of NA at the bottom of my dataset. The top answer here provided a good way of solving my issue using complete.cases:

Remove rows with all or some NAs (missing values) in data.frame

i.e.

data_set1 <- data_set1[complete.cases(data_set1[11:103]), ]

However, that only allows me to remove rows with any missing data in the specified columns. I'm struggling to get complete.cases to play along with rowSums and stop it from removing rows with all missing data.

Any advice very much appreciated!

Upvotes: 1

Views: 228

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

Try using rowSums like :

cols <- 11:103
vals <- rowSums(is.na(data_set1[cols]))
data_set2 <- data_set1[!(vals > 0 & vals < length(cols)), ]

Or with complete.cases and rowSums

data_set1[complete.cases(data_set1[cols]) | 
         rowSums(is.na(data_set1[cols])) == length(cols) , ]

With reproducible example,

df <- data.frame(a = c(1, 2, 3, NA, 1), b = c(NA, 2, 3, NA, NA), c = 1:5)
cols <- 1:2

vals <- rowSums(is.na(df[cols]))
df[!(vals > 0 & vals < length(cols)), ]

#   a  b c
#2  2  2 2
#3  3  3 3
#4 NA NA 4

Upvotes: 2

Related Questions