MeiLing
MeiLing

Reputation: 33

Any idea on how to delete rows based on conditions in R?

I'm new to R and I have been using the dplyr package to summarise. I couldn't figure out a solution to the following problem.

My tibble: 8 x 3 is as below:

df <- tibble(id = c(1, 1, 1, 1, 1, 2, 2, 2 ),
                 block = c(2, 2, 2, 3, 3, 2, 2, 3), 
                 answer = "TRUE")
id block answer
1 2 TRUE
1 2 TRUE
1 2 TRUE
1 3 TRUE
1 3 TRUE
2 2 TRUE
2 2 TRUE
2 3 TRUE

ID -> participant ID

block -> highest correct block

I want to retain only the highest block score for each participant but the highest block score is defined as the highest score with at least two correct (TRUE) in each block condition.

In the example above, the participant 1's highest block score would be 3 as there are at least two correct responses at block 3. The highest score for participant 2 is 2. See the example below.

The final output should be like this:

id block answer
1 3 TRUE
2 2 TRUE

Anyone has any idea on how I should start? Or any package that can solve this problem?

I managed to maintain the last row for each participant using the slice_tail argument but not sure how to proceed from there

Upvotes: 3

Views: 204

Answers (3)

SBMVNO
SBMVNO

Reputation: 642

Here is a suggestion if you would like to use base R only.

s=lapply(1:nrow(df),
    function(x) 
      df[df$id==x, ][ as.vector(max(df[df$id==x, 'block'])==df[df$id==x, 'block']), ]
    )

ss=do.call(rbind.data.frame, s)
unique(ss)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388862

filter to keep only those blocks that have >= 2 TRUE values, later for each id keep the row with max block value.

library(dplyr)

df %>%
  group_by(id, block) %>%
  filter(sum(as.logical(answer)) >= 2) %>%
  group_by(id) %>%
  slice(which.max(block)) %>%
  ungroup

#    id block answer
#  <dbl> <dbl> <chr> 
#1     1     3 TRUE  
#2     2     2 TRUE  

Upvotes: 5

Kra.P
Kra.P

Reputation: 15123

This will work

library(dplyr)

df %>%
  group_by(id,block) %>%
  filter(answer == TRUE) %>%
  filter(n() > 1) %>%
  distinct() %>%
  group_by(id) %>%
  filter(block == max(block))

     id block answer
  <dbl> <dbl> <chr> 
1     1     3 TRUE  
2     2     2 TRUE 

Upvotes: 3

Related Questions