Syed Ahmed
Syed Ahmed

Reputation: 209

Removal of NA's in specific columns R

Although many questions have been asked like this at this forum but I've gone through most of them and my problem is a bit different. I have a dataframe:

A  B  C  key
NA NA NA LIMA
3  1  NA GAMMA
NA NA NA SIGNA
NA 2  NA BETA
NA NA 4  SIGMA

And I want to remove rows which have NA on the feature set (A, B, C). The key is never empty and will never have missing values. So the resulting frame would look like:

A  B  C  key
3  1  NA GAMMA
NA 2  NA BETA
NA NA 4  SIGMA

The structure of a similar dataframe can be copied from below

structure(list(A = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), B = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), C = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), D = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), E = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), F = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), G = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), H = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), I = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), J = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), K = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), L = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), M = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), N = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), O = c(0, 2, NA, NA, 1, 1), P = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Q = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), R = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), S = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), T = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), key = c("django", 
"lima", "bravo", "alpha", "gamma", 
"meta")), row.names = c(100077L, 93143L, 244634L, 25010L, 
117228L, 147983L), class = "data.frame")

Any help would be appreciated.

Upvotes: 1

Views: 223

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

You can use rowSums :

cols <- c('A', 'B','C')
df[rowSums(!is.na(df[cols])) != 0, ]

Using dplyr you could use across :

library(dplyr)
df %>% filter(Reduce(`|`, across(all_of(cols), ~!is.na(.))))

Upvotes: 5

Gregor Thomas
Gregor Thomas

Reputation: 145775

## generate a list of feature columns however you like
feature_cols = c("A", "B", "C")
## keep rows where there are fewer NAs (in the feature columns) than feature columns
new_data = old_data[rowSums(is.na(old_data[feature_cols])) < length(feature_cols), ] 

Upvotes: 1

Related Questions