chumpbucket
chumpbucket

Reputation: 1

How do I sum a column of values consecutively row by row within certain column IDs?

This is a pretty complicated one, sorry ahead of time!

I am attempting to add column values in one column consecutively (CURRENT_FIX_DURATION), with a new column created with the values going up, but only within specified rows (as specified by TRIAL_INDEX, 1-160, within ID, 75 separate IDs).

Here is part of my data frame:

       ID TRIAL_INDEX CURRENT_FIX_DURATION CURRENT_FIX_INDEX
1 bb10jml2           1                  462                 1
2 bb10jml2           1                  166                 2
3 bb10jml2           1                   60                 3
4 bb10jml2           1                  118                 4
5 bb10jml2           1                   60                 5
  CURRENT_FIX_INTEREST_AREA_INDEX
1                               5
2                               3
3                               .
4                               4
5                               .

There are many 160 trials for each, and 75 separate IDs, with varying numbers of numbers to be added in the column CURRENT_FIX_DURATION.

I would want to be able to add up the #s for CURRENT_FIX_DURATION, with the summing stopping at the end of a trial, and repeating for the next trial.

Here's a sample output of what I would want to achieve:

CURRENT_FIX_DURATION
462
628
688
806
866

I would want this to continue until it reached TRIAL_INDEX 2, and then start over with the beginning of the next value not being summed in with the previous TRIAL_INDEX's CURRENT_FIX_DURATION column.

Is this possible to achieve? I though of using for loops, but I'm not sure where to begin within a data frame. In general, the difficulty is compounded by the fact that the numbers to be added for each Subject/Trial is completely variable.

Should I convert this to long format and try with ddply?

Let me know what you think or if you would like more information!

Thank you for your time!

Upvotes: 0

Views: 87

Answers (1)

Cettt
Cettt

Reputation: 11981

Here is a solution within the tidyverse using map2 from the purrrpackage.

library(tidyverse)
mydata <- tibble(id = rep("a", 5), trial_index = rep(1, 5),
       current_fix_duration = c(462, 166, 60, 118, 60),
       current_fix_index = 1:5)


newdata <- mydata %>% group_by(id) %>% 
  mutate(current_fix_duration2 = map2_dbl(trial_index, current_fix_index, ~sum(current_fix_duration[.x:.y]))) %>%
as.data.frame()

# A tibble: 5 x 5
# Groups:   id [1]
     id trial_index current_fix_duration current_fix_index current_fix_duration2
  <chr>       <dbl>                <dbl>             <int>                 <dbl>
1     a           1                  462                 1                   462
2     a           1                  166                 2                   628
3     a           1                   60                 3                   688
4     a           1                  118                 4                   806
5     a           1                   60                 5                   866

Upvotes: 1

Related Questions