Reputation: 1078
I have a data frame of around 1000 columns. I am interested in 14 satisfaction rating variables.
I need to remove rows where any of the 14 rating variables contains "Item skipped" or NA.
Is there a way I can remove all rows where NA or "Item skipped" appears in my vector of satisfaction rating variables of interest, currently in a vector 'cols'. In below example 'cols' contains Service, Efficiency and Flavour but NOT Experience and Quality
cols = c("Service","Efficiency","Flavour")
dat<-data.frame(Number = 1:6, University = c("A","B","C","D","E","F"),
Service=c("Satisfied","Item skipped",NA, "Not satisfied", "Neither","Item skipped" ),
Efficiency =c("Neither", "Neither", "Item skipped","Satisfied", NA, NA),
Flavour =c("Satisfied", NA, "Item skipped",
"Neither", NA, NA), Quality =c("Not satisfied", "Neither", NA,"Satisfied", NA, NA),
Experience =c("Satisfied", NA, NA,
"Not satisfied", NA, NA),Age =rep(c(18:19), times =3))
Upvotes: 0
Views: 58
Reputation: 389275
In base R, we can use rowSums
to remove rows where "Item skipped" or NA
is present in cols
cols = c("Service", "Efficiency", "Flavour")
dat[rowSums(dat[cols] == "Item skipped" | is.na(dat[cols])) == 0, ]
# Number University Service Efficiency Flavour Quality Experience Age
#1 1 A Satisfied Neither Satisfied Not satisfied Satisfied 18
#4 4 D Not satisfied Satisfied Neither Satisfied Not satisfied 19
An alternative using apply
suggested by @amrrs
dat[!apply(dat[cols], 1, function(x) any(x == 'Item skipped' | is.na(x))), ]
Upvotes: 2
Reputation: 13319
EDIT:: Using updated data we could use(this is on the assumption that NA always appears with "Item_Skipped" which seems to be the case):
dat %>%
filter(!is.na(Experience))
Number University Service Efficiency Flavour Quality Experience Age
1 1 A Satisfied Neither Satisfied Not satisfied Satisfied 18
2 4 D Not satisfied Satisfied Neither Satisfied Not satisfied 19
Original::
We could use(data in Note below):
dat %>%
filter_at(vars(contains("rating")),all_vars(.!="Item Skipped"))
OR::
dat %>%
filter_all(all_vars(.!="Item Skipped"))
Output:
Number University Service_rating Efficiency_rating Flavour_rating Age
1 1 A Satisfied Neither Satisfied 18
2 4 D Not satisfied Satisfied Neither 19
NOTE:
dat<-data.frame(Number = 1:6, University = c("A","B","C","D","E","F"),
Service_rating=c("Satisfied","Item skipped",NA, "Not satisfied", "Neither","Item skipped" ),
Efficiency_rating =c("Neither", "Neither", "Item skipped","Satisfied", NA, NA),
Flavour_rating =c("Satisfied", NA, "Item skipped",
"Neither", NA, NA), Age =rep(c(18:19), times =3))
Upvotes: 0