Reputation: 3
I am trying to calculate and create a new column for the score correct on a test. Recall.CRESP
is a column specifying the correct answers on a test selected through grid coordinates. Recall.RESP
shows participants response.
These columns look something like this:
|Recall.CRESP |Recall.RESP |
|---------------------------------|---------------------------------|
|grid35grid51grid12grid43grid54 |grid35grid51grid12grid43grid54 |
|grid11gird42gird22grid51grid32 |grid11gird15gird55grid42grid32 |
So for example in row 1 of this table, the participant got 5/5 correct as the grid coordinates of Recall.CRESP
matches with Recall.RESP
. However in row 2, the participant only got 2/5 correct as only the first and the last grid coordinate are identical. The order of the coordinates must match to be correct.
My new column should show 5 and 2 for the two rows respectively. I am unsure how to split apart the grid coordinates and also to tell R the order must match to be correct.
Upvotes: 0
Views: 49
Reputation: 24238
You can do this without tidyverse
with a simple mapply
and custom split_grid
function (I assume only the numbers are relevant):
df <- data_frame(Recall.CRESP = c("grid35grid51grid12grid43grid54", "grid11gird42gird22grid51grid32"),
Recall.RESP = c("grid35grid51grid12grid43grid54", "grid11gird15gird55grid42grid32"))
split_grid <- function(x) {
unlist(regmatches(x, gregexpr("[[:digit:]]+", x)))
}
compare <- function(x, y) {
sum(split_grid(x) == split_grid(y))
}
df$Res <- mapply(compare, df$Recall.CRESP, df$Recall.RESP)
# A tibble: 2 x 3
Recall.CRESP Recall.RESP Res
<chr> <chr> <int>
1 grid35grid51grid12grid43grid54 grid35grid51grid12grid43grid54 5
2 grid11gird42gird22grid51grid32 grid11gird15gird55grid42grid32 2
Upvotes: 0
Reputation: 43354
A nice way to handle this is with list columns, wherein you can store a whole set of responses or values in a way that is easy to iterate over. In tidyverse grammar,
library(tidyverse)
responses <- data_frame(Recall.CRESP = c("grid35grid51grid12grid43grid54", "grid11gird42gird22grid51grid32"),
Recall.RESP = c("grid35grid51grid12grid43grid54", "grid11gird15gird55grid42grid32"))
scored <- responses %>%
mutate_all(~strsplit(.x, '[^^]g[ri]{2}d')) %>% # split on all but first "grid"/"gird"
mutate(correct = map2(Recall.CRESP, Recall.RESP, `==`),
score = map_int(correct, sum))
scored
#> # A tibble: 2 x 4
#> Recall.CRESP Recall.RESP correct score
#> <list> <list> <list> <int>
#> 1 <chr [5]> <chr [5]> <lgl [5]> 5
#> 2 <chr [5]> <chr [5]> <lgl [5]> 2
Pull out the individual columns if you'd like a closer look at the data.
Upvotes: 1