user2165857
user2165857

Reputation: 2690

R: remove all rows that are missing data based on columns

I have the following sample dataframe in R:

Test <- data.frame("Individual"=c("John", "John", "Alice", "Alice", "Alice", "Eve", "Eve","Eve","Jack"), "ExamNumber"=c("Test1", "Test2", "Test1", "Test2", "Test3", "Test1", "Test2", "Test3",  "Test3"))

Which Gives:

  Individual ExamNumber
1       John      Test1
2       John      Test2
3      Alice      Test1
4      Alice      Test2
5      Alice      Test3
6        Eve      Test1
7        Eve      Test2
8        Eve      Test3
9       Jack      Test3

However I want to remove any Individual who does not have all three test to result in:

  Individual ExamNumber
1      Alice      Test1
2      Alice      Test2
3      Alice      Test3
4        Eve      Test1
5        Eve      Test2
6        Eve      Test3

Upvotes: 0

Views: 101

Answers (3)

Sathish
Sathish

Reputation: 12703

Using base R

ind_eq3 <- names( which( with( Test, by( Test, 
                                         INDICES = list(Individual), 
                                         FUN = function(x) length(unique(x$ExamNumber)) == 3) ) ) )
with(Test, Test[ Individual %in% ind_eq3, ] )

#   Individual ExamNumber
# 3      Alice      Test1
# 4      Alice      Test2
# 5      Alice      Test3
# 6        Eve      Test1
# 7        Eve      Test2
# 8        Eve      Test3

Using data.table

library('data.table')
setDT(Test)[ , 
             j  = .SD[length( unique(ExamNumber) ) == 3, ],
             by = 'Individual']

Upvotes: 2

JasonWang
JasonWang

Reputation: 2434

Here is another way using dplyr to check whether all three tests exist within groups:

library(dplyr)
Test %>% 
  group_by(Individual) %>%
  filter(all(c("Test1", "Test2", "Test3") %in% ExamNumber)) %>%
  ungroup()

# A tibble: 6 × 2
  Individual ExamNumber
      <fctr>     <fctr>
1      Alice      Test1
2      Alice      Test2
3      Alice      Test3
4        Eve      Test1
5        Eve      Test2
6        Eve      Test3

Upvotes: 3

d.b
d.b

Reputation: 32538

You can use ave to group by Individual and check if the count for each group is 3 using NROW

Test[ave(1:nrow(Test), Test$Individual, FUN = NROW)==3,]
#  Individual ExamNumber
#3      Alice      Test1
#4      Alice      Test2
#5      Alice      Test3
#6        Eve      Test1
#7        Eve      Test2
#8        Eve      Test3

And here is a slightly more robust approach using same idea but with split

Test[order(Test$Individual),][unlist(lapply(split(Test, Test$Individual), function(a)
          rep(all(unique(Test$ExamNumber) %in% a$ExamNumber), NROW(a)))),]

Upvotes: 2

Related Questions