monkeyshines
monkeyshines

Reputation: 1078

Remove rows where multiple conditions are met over selected columns

I have a data frame of around 1000 columns. I am interested in 14 satisfaction rating variables.

I need to remove rows where any of the 14 rating variables contains "Item skipped" or NA.

Is there a way I can remove all rows where NA or "Item skipped" appears in my vector of satisfaction rating variables of interest, currently in a vector 'cols'. In below example 'cols' contains Service, Efficiency and Flavour but NOT Experience and Quality

cols = c("Service","Efficiency","Flavour")
dat<-data.frame(Number = 1:6, University = c("A","B","C","D","E","F"), 
                Service=c("Satisfied","Item skipped",NA, "Not satisfied", "Neither","Item skipped" ), 
                Efficiency =c("Neither", "Neither", "Item skipped","Satisfied", NA, NA),
                Flavour =c("Satisfied", NA, "Item skipped",
                                     "Neither", NA, NA), Quality =c("Not satisfied", "Neither", NA,"Satisfied", NA, NA),
                Experience =c("Satisfied", NA, NA,
                                     "Not satisfied", NA, NA),Age =rep(c(18:19), times =3))

Upvotes: 0

Views: 58

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389275

In base R, we can use rowSums to remove rows where "Item skipped" or NA is present in cols

cols = c("Service", "Efficiency", "Flavour")

dat[rowSums(dat[cols] == "Item skipped" | is.na(dat[cols])) == 0, ]

#  Number University       Service Efficiency   Flavour       Quality    Experience Age
#1      1          A     Satisfied    Neither Satisfied Not satisfied     Satisfied  18
#4      4          D Not satisfied  Satisfied   Neither     Satisfied Not satisfied  19

An alternative using apply suggested by @amrrs

dat[!apply(dat[cols], 1, function(x) any(x == 'Item skipped' | is.na(x))), ]

Upvotes: 2

NelsonGon
NelsonGon

Reputation: 13319

EDIT:: Using updated data we could use(this is on the assumption that NA always appears with "Item_Skipped" which seems to be the case):

 dat %>% 

     filter(!is.na(Experience))  

        Number University       Service Efficiency   Flavour       Quality    Experience Age
    1       1          A     Satisfied    Neither Satisfied Not satisfied     Satisfied  18
    2       4          D Not satisfied  Satisfied   Neither     Satisfied Not satisfied  19

Original::

We could use(data in Note below):

dat %>% 
  filter_at(vars(contains("rating")),all_vars(.!="Item Skipped"))

OR::

dat %>% 
  filter_all(all_vars(.!="Item Skipped"))

Output:

 Number University Service_rating Efficiency_rating Flavour_rating Age
1      1          A      Satisfied           Neither      Satisfied  18
2      4          D  Not satisfied         Satisfied        Neither  19

NOTE:

dat<-data.frame(Number = 1:6, University = c("A","B","C","D","E","F"), 
                Service_rating=c("Satisfied","Item skipped",NA, "Not satisfied", "Neither","Item skipped" ), 
                Efficiency_rating =c("Neither", "Neither", "Item skipped","Satisfied", NA, NA),
                Flavour_rating =c("Satisfied", NA, "Item skipped",
                                  "Neither", NA, NA), Age =rep(c(18:19), times =3))

Upvotes: 0

Related Questions