Reputation: 209
Although many questions have been asked like this at this forum but I've gone through most of them and my problem is a bit different. I have a dataframe:
A B C key
NA NA NA LIMA
3 1 NA GAMMA
NA NA NA SIGNA
NA 2 NA BETA
NA NA 4 SIGMA
And I want to remove rows which have NA on the feature set (A, B, C). The key is never empty and will never have missing values. So the resulting frame would look like:
A B C key
3 1 NA GAMMA
NA 2 NA BETA
NA NA 4 SIGMA
The structure of a similar dataframe can be copied from below
structure(list(A = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), B = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), C = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), D = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), E = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), F = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), G = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), H = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), I = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), J = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), K = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), L = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), M = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), N = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), O = c(0, 2, NA, NA, 1, 1), P = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Q = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), R = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), S = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), T = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), key = c("django",
"lima", "bravo", "alpha", "gamma",
"meta")), row.names = c(100077L, 93143L, 244634L, 25010L,
117228L, 147983L), class = "data.frame")
Any help would be appreciated.
Upvotes: 1
Views: 223
Reputation: 388982
You can use rowSums
:
cols <- c('A', 'B','C')
df[rowSums(!is.na(df[cols])) != 0, ]
Using dplyr
you could use across
:
library(dplyr)
df %>% filter(Reduce(`|`, across(all_of(cols), ~!is.na(.))))
Upvotes: 5
Reputation: 145775
## generate a list of feature columns however you like
feature_cols = c("A", "B", "C")
## keep rows where there are fewer NAs (in the feature columns) than feature columns
new_data = old_data[rowSums(is.na(old_data[feature_cols])) < length(feature_cols), ]
Upvotes: 1