Rboy
Rboy

Reputation: 3

Creating a third column based from matching strings from two other columns

I am trying to calculate and create a new column for the score correct on a test. Recall.CRESP is a column specifying the correct answers on a test selected through grid coordinates. Recall.RESP shows participants response.

These columns look something like this:

|Recall.CRESP                     |Recall.RESP                      |
|---------------------------------|---------------------------------|           
|grid35grid51grid12grid43grid54   |grid35grid51grid12grid43grid54   |                
|grid11gird42gird22grid51grid32   |grid11gird15gird55grid42grid32   |

So for example in row 1 of this table, the participant got 5/5 correct as the grid coordinates of Recall.CRESP matches with Recall.RESP. However in row 2, the participant only got 2/5 correct as only the first and the last grid coordinate are identical. The order of the coordinates must match to be correct.

My new column should show 5 and 2 for the two rows respectively. I am unsure how to split apart the grid coordinates and also to tell R the order must match to be correct.

Upvotes: 0

Views: 49

Answers (2)

m0nhawk
m0nhawk

Reputation: 24238

You can do this without tidyverse with a simple mapply and custom split_grid function (I assume only the numbers are relevant):

df <- data_frame(Recall.CRESP = c("grid35grid51grid12grid43grid54", "grid11gird42gird22grid51grid32"),
                 Recall.RESP = c("grid35grid51grid12grid43grid54", "grid11gird15gird55grid42grid32"))

split_grid <- function(x) {
    unlist(regmatches(x, gregexpr("[[:digit:]]+", x)))
}

compare <- function(x, y) {
    sum(split_grid(x) == split_grid(y))
}

df$Res <- mapply(compare, df$Recall.CRESP, df$Recall.RESP)

# A tibble: 2 x 3
  Recall.CRESP                   Recall.RESP                      Res
  <chr>                          <chr>                          <int>
1 grid35grid51grid12grid43grid54 grid35grid51grid12grid43grid54     5
2 grid11gird42gird22grid51grid32 grid11gird15gird55grid42grid32     2

Upvotes: 0

alistaire
alistaire

Reputation: 43354

A nice way to handle this is with list columns, wherein you can store a whole set of responses or values in a way that is easy to iterate over. In tidyverse grammar,

library(tidyverse)

responses <- data_frame(Recall.CRESP = c("grid35grid51grid12grid43grid54", "grid11gird42gird22grid51grid32"), 
                        Recall.RESP = c("grid35grid51grid12grid43grid54", "grid11gird15gird55grid42grid32"))

scored <- responses %>% 
    mutate_all(~strsplit(.x, '[^^]g[ri]{2}d')) %>%    # split on all but first "grid"/"gird"
    mutate(correct = map2(Recall.CRESP, Recall.RESP, `==`), 
           score = map_int(correct, sum))

scored
#> # A tibble: 2 x 4
#>   Recall.CRESP Recall.RESP correct   score
#>   <list>       <list>      <list>    <int>
#> 1 <chr [5]>    <chr [5]>   <lgl [5]>     5
#> 2 <chr [5]>    <chr [5]>   <lgl [5]>     2

Pull out the individual columns if you'd like a closer look at the data.

Upvotes: 1

Related Questions