Abe
Abe

Reputation: 415

Create series of variables that are cumulative sums of other variables using dplyr

I have a series of numeric variables (in the real data they are binned income counts from the American Community Survey) from which I need to create cumulative summation variables. I know the typical advice here is to reshape the data, but I don't think that is the answer in this case (but am open to being proved wrong). Neither sum nor rowsum has worked for me.

Reprex below, followed by expected results. Essentially e=a, f=a+b, g=a+b+c, h=a+b+c+d.

library(reprex)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.0.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
a <- c(10,10,20,20,30)
b <- c(30,50,60,20,10)
c <- c(5,5,5,1,10)
d <- c(5,5,5,3,5)
df <- data.frame(a,b,c,d)
df <- df %>% 
  mutate(e = a, 
         f = sum(e,b), 
         g = sum(f,c), 
         h = sum(g,d))
df
#>    a  b  c d  e   f    g    h
#> 1 10 30  5 5 10 260 1326 6653
#> 2 10 50  5 5 10 260 1326 6653
#> 3 20 60  5 5 20 260 1326 6653
#> 4 20 20  1 3 20 260 1326 6653
#> 5 30 10 10 5 30 260 1326 6653
df <- df %>% 
  mutate(e = a, 
         f = rowsum(e,b), 
         g = rowsum(f,c), 
         h = sum(g,d))
#> Error: Invalid index: out of bounds
df
#>    a  b  c d  e   f    g    h
#> 1 10 30  5 5 10 260 1326 6653
#> 2 10 50  5 5 10 260 1326 6653
#> 3 20 60  5 5 20 260 1326 6653
#> 4 20 20  1 3 20 260 1326 6653
#> 5 30 10 10 5 30 260 1326 6653

Expected results:

#>    a  b  c d  e  f  g  h
#> 1 10 30  5 5 10 40 45 50
#> 2 10 50  5 5 10 60 65 70
#> 3 20 60  5 5 20 80 85 90
#> 4 20 20  1 3 20 40 41 44
#> 5 30 10 10 5 30 40 50 55

Upvotes: 1

Views: 67

Answers (2)

sconfluentus
sconfluentus

Reputation: 4993

Unless I misunderstood you, I think it is simpler than you have made things:

df <- df %>% 
mutate(e = a, 
     f = a + b, 
     g = f + c, 
     h = g + d)

The secret is that the makers of the data.frame class and its methods have built into it the vector-wise math capabilities. So, by adding two columns together it automatically does what you are hoping for.

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

You could take rowSums incrementing one column at a time.

df[letters[5:8]] <- do.call(cbind, lapply(seq(ncol(df)), 
                            function(x) rowSums(df[1:x])))
df

#   a  b  c d  e  f  g  h
#1 10 30  5 5 10 40 45 50
#2 10 50  5 5 10 60 65 70
#3 20 60  5 5 20 80 85 90
#4 20 20  1 3 20 40 41 44
#5 30 10 10 5 30 40 50 55

Or if you are interested in a tidyverse solution :

library(dplyr)

df %>%
  bind_cols(purrr::map_dfc(seq(ncol(df)), 
                 ~df %>% select(1:.x) %>% rowSums) %>% setNames(letters[5:8]))
                          

Upvotes: 1

Related Questions