Filter a dataframe by its entry

Question

How do I filter a dataset by a specific value that can occur anywhere in the data frame and not necessarily under any one column or row ?

Suppose I have a data frame that is like this.

   id gender group Student_Math_1 Student_Math_2 Student_Read_1 Student_Read_2
   46      M   Red             23             45             37             56
   46      M   Red             34             36             33             78
   46      M   Red             56             63             58             NA
   62      F  Blue             59             NA             NA             68
   62      F  Blue             NA             68             87             73
   38      M   Red             78             57             NA             65
   38      M   Red             NA             75             54             NA
   17      F  Blue             74             NA             56             72
   17      F  Blue             75             61             NA             79
   17      F  Blue             NA             74             43             81

And I am trying to subset this data frame such that I retain all the rows and columns that contain the value 68 regardless of where it occurs within the data frame.

The final output would be

   id gender group Student_Math_1 Student_Math_2 Student_Read_1 Student_Read_2

   62      F  Blue             59             NA             NA             68
   62      F  Blue             NA             68             87             73

Any tips or suggestions are welcome. Thanks in advance.

df = structure(list(id = c(46, 46, 46, 62, 62, 38, 38, 17, 17, 17), 
    gender = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 
    1L), .Label = c("F", "M"), class = "factor"), group = structure(c(2L, 
    2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), .Label = c("Blue", "Red"
    ), class = "factor"), Student_Math_1 = c(23, 34, 56, 59, 
    NA, 78, NA, 74, 75, NA), Student_Math_2 = c(45, 36, 63, NA, 
    68, 57, 75, NA, 61, 74), Student_Read_1 = c(37, 33, 58, NA, 
    87, NA, 54, 56, NA, 43), Student_Read_2 = c(56, 78, NA, 68, 
    73, 65, NA, 72, 79, 81)), .Names = c("id", "gender", "group", 
"Student_Math_1", "Student_Math_2", "Student_Read_1", "Student_Read_2"
), row.names = c(NA, -10L), class = "data.frame")

989 · Accepted Answer

Alternatively,

df[unique(which(df==68, arr.ind = T)[,1]),]

#  id gender group Student_Math_1 Student_Math_2 Student_Read_1 Student_Read_2
#5 62      F  Blue             NA             68             87             73
#4 62      F  Blue             59             NA             NA             68

In this case, you don't need to care about the position of the columns or where NAs are appeared. unique is used in case 68 is appeared more than once per row.

Filter a dataframe by its entry

Answers (2)

Related Questions