Reputation: 33
I'm new to R and I have been using the dplyr package to summarise. I couldn't figure out a solution to the following problem.
My tibble: 8 x 3 is as below:
df <- tibble(id = c(1, 1, 1, 1, 1, 2, 2, 2 ),
block = c(2, 2, 2, 3, 3, 2, 2, 3),
answer = "TRUE")
id | block | answer |
---|---|---|
1 | 2 | TRUE |
1 | 2 | TRUE |
1 | 2 | TRUE |
1 | 3 | TRUE |
1 | 3 | TRUE |
2 | 2 | TRUE |
2 | 2 | TRUE |
2 | 3 | TRUE |
ID -> participant ID
block -> highest correct block
I want to retain only the highest block score for each participant but the highest block score is defined as the highest score with at least two correct (TRUE) in each block condition.
In the example above, the participant 1's highest block score would be 3 as there are at least two correct responses at block 3. The highest score for participant 2 is 2. See the example below.
The final output should be like this:
id | block | answer |
---|---|---|
1 | 3 | TRUE |
2 | 2 | TRUE |
Anyone has any idea on how I should start? Or any package that can solve this problem?
I managed to maintain the last row for each participant using the slice_tail
argument but not sure how to proceed from there
Upvotes: 3
Views: 204
Reputation: 642
Here is a suggestion if you would like to use base R only.
s=lapply(1:nrow(df),
function(x)
df[df$id==x, ][ as.vector(max(df[df$id==x, 'block'])==df[df$id==x, 'block']), ]
)
ss=do.call(rbind.data.frame, s)
unique(ss)
Upvotes: 1
Reputation: 388862
filter
to keep only those blocks that have >= 2
TRUE
values, later for each id
keep the row with max
block
value.
library(dplyr)
df %>%
group_by(id, block) %>%
filter(sum(as.logical(answer)) >= 2) %>%
group_by(id) %>%
slice(which.max(block)) %>%
ungroup
# id block answer
# <dbl> <dbl> <chr>
#1 1 3 TRUE
#2 2 2 TRUE
Upvotes: 5
Reputation: 15123
This will work
library(dplyr)
df %>%
group_by(id,block) %>%
filter(answer == TRUE) %>%
filter(n() > 1) %>%
distinct() %>%
group_by(id) %>%
filter(block == max(block))
id block answer
<dbl> <dbl> <chr>
1 1 3 TRUE
2 2 2 TRUE
Upvotes: 3