spaolizzi
spaolizzi

Reputation: 13

Summing consecutive values, broken up by specific value, in R

I'm having a trouble figuring out how to group variables to achieve the desired result from dplyr. I have an experimental dataset set up like this:

  subject   task_phase  block_number trial_number ResponseCorrect
   <chr>     <chr>              <dbl>        <dbl>           <dbl>
 1 268301377    1            1            2               1
 2 268301377    1            1            3               1
 3 268301377    1            1            4               1
 4 268301377    1            2            2              -1
 5 268301377    1            2            3               1
 6 268301377    1            2            4               1
 7 268301377    1            3            2               1
 8 268301377    1            3            3              -1
 9 268301377    1            3            4               1
10 268301377    2            1           50               1
11 268301377    2            1           51               1
12 268301377    2            1           52               1
13 268301377    2            2           37              -1
14 268301377    2            2           38               1
15 268301377    2            2           39               1
16 268301377    2            3           41              -1
17 268301377    2            3           42              -1
18 268301377    2            3           43               1

I'm hoping to sum the consecutive "correct" responses, and to have this tally "reset" each time there was an incorrect response:

  subject   task_phase  block_number trial_number ResponseCorrect   ConsecutiveCorrect
   <chr>     <chr>              <dbl>        <dbl>       <dbl>            <dbl>
 1 268301377    1            1            1               1                 1
 2 268301377    1            1            2               1                 2
 3 268301377    1            1            3               1                 3
 4 268301377    1            2            1              -1                 0
 5 268301377    1            2            2               1                 1
 6 268301377    1            2            3               1                 2
 7 268301377    1            3            1               1                 1
 8 268301377    1            3            2              -1                 0
 9 268301377    1            3            3               1                 1
10 268301377    2            1            1               1                 1
11 268301377    2            1            2               1                 2
12 268301377    2            1            3               1                 3
13 268301377    2            2            1              -1                 0
14 268301377    2            2            2               1                 1
15 268301377    2            2            3               1                 2
16 268301377    2            3            1              -1                 0
17 268301377    2            3            2              -1                 0
18 268301377    2            3            3               1                 1

I originally thought I could do something along the lines of df %>% group_by(subject, task_phase, block_number, ResponseCorrect) %>% mutate(ConsecutiveCorrect = cumsum(ResponseCorrect), and that almost works. But, it doesn't give a consecutive value: it just sums up the total number of correct responses per block (. I'm essentially trying to use the -1s as break points that start the summation over again.

Is there a grouping function (Tidyverse or otherwise) that I'm not aware of that could do something along these lines?

Upvotes: 1

Views: 335

Answers (2)

akrun
akrun

Reputation: 887851

An option with data.table. Grouped by 'subject', 'task_phase', 'block_number', get the run-length-id (rleid) of 'ResponseCorrect', return with rowid of that sequence, multiply with a logical vector so that elements that corresponds to -1 (FALSE -> 0 will return 0 and TRUE -> 1 returns the element)

library(data.table)
setDT(df)[, ConsecutiveCorrect := rowid(rleid(ResponseCorrect)) *
         (ResponseCorrect == 1), by = .(subject, task_phase, block_number)]

-output

df
      subject task_phase block_number trial_number ResponseCorrect ConsecutiveCorrect
 1: 268301377          1            1            2               1                  1
 2: 268301377          1            1            3               1                  2
 3: 268301377          1            1            4               1                  3
 4: 268301377          1            2            2              -1                  0
 5: 268301377          1            2            3               1                  1
 6: 268301377          1            2            4               1                  2
 7: 268301377          1            3            2               1                  1
 8: 268301377          1            3            3              -1                  0
 9: 268301377          1            3            4               1                  1
10: 268301377          2            1           50               1                  1
11: 268301377          2            1           51               1                  2
12: 268301377          2            1           52               1                  3
13: 268301377          2            2           37              -1                  0
14: 268301377          2            2           38               1                  1
15: 268301377          2            2           39               1                  2
16: 268301377          2            3           41              -1                  0
17: 268301377          2            3           42              -1                  0
18: 268301377          2            3           43               1                  1

data

df <- structure(list(subject = c(268301377L, 268301377L, 268301377L, 
268301377L, 268301377L, 268301377L, 268301377L, 268301377L, 268301377L, 
268301377L, 268301377L, 268301377L, 268301377L, 268301377L, 268301377L, 
268301377L, 268301377L, 268301377L), task_phase = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    block_number = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 
    1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), trial_number = c(2L, 3L, 
    4L, 2L, 3L, 4L, 2L, 3L, 4L, 50L, 51L, 52L, 37L, 38L, 39L, 
    41L, 42L, 43L), ResponseCorrect = c(1L, 1L, 1L, -1L, 1L, 
    1L, 1L, -1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, -1L, -1L, 1L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18"))

Upvotes: 1

Martin Gal
Martin Gal

Reputation: 16998

You could try

library(dplyr)

data %>% 
  group_by(
    subject, 
    task_phase, 
    block_number, 
    grp = lag(cumsum(ResponseCorrect == -1), default = 0)
    ) %>% 
  mutate(ConsecutiveCorrect = ifelse(ResponseCorrect == -1, 0, cumsum(ResponseCorrect))) %>% 
  ungroup() %>% 
  select(-grp)

which returns

# A tibble: 18 x 6
     subject task_phase block_number trial_number ResponseCorrect ConsecutiveCorrect
       <dbl>      <dbl>        <dbl>        <dbl>           <dbl>              <dbl>
 1 268301377          1            1            2               1                  1
 2 268301377          1            1            3               1                  2
 3 268301377          1            1            4               1                  3
 4 268301377          1            2            2              -1                  0
 5 268301377          1            2            3               1                  1
 6 268301377          1            2            4               1                  2
 7 268301377          1            3            2               1                  1
 8 268301377          1            3            3              -1                  0
 9 268301377          1            3            4               1                  1
10 268301377          2            1           50               1                  1
11 268301377          2            1           51               1                  2
12 268301377          2            1           52               1                  3
13 268301377          2            2           37              -1                  0
14 268301377          2            2           38               1                  1
15 268301377          2            2           39               1                  2
16 268301377          2            3           41              -1                  0
17 268301377          2            3           42              -1                  0
18 268301377          2            3           43               1                  1

Upvotes: 2

Related Questions