Reputation: 415
I have a series of numeric variables (in the real data they are binned income counts from the American Community Survey) from which I need to create cumulative summation variables. I know the typical advice here is to reshape the data, but I don't think that is the answer in this case (but am open to being proved wrong). Neither sum
nor rowsum
has worked for me.
Reprex below, followed by expected results. Essentially e=a
, f=a+b
, g=a+b+c
, h=a+b+c+d
.
library(reprex)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.0.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
a <- c(10,10,20,20,30)
b <- c(30,50,60,20,10)
c <- c(5,5,5,1,10)
d <- c(5,5,5,3,5)
df <- data.frame(a,b,c,d)
df <- df %>%
mutate(e = a,
f = sum(e,b),
g = sum(f,c),
h = sum(g,d))
df
#> a b c d e f g h
#> 1 10 30 5 5 10 260 1326 6653
#> 2 10 50 5 5 10 260 1326 6653
#> 3 20 60 5 5 20 260 1326 6653
#> 4 20 20 1 3 20 260 1326 6653
#> 5 30 10 10 5 30 260 1326 6653
df <- df %>%
mutate(e = a,
f = rowsum(e,b),
g = rowsum(f,c),
h = sum(g,d))
#> Error: Invalid index: out of bounds
df
#> a b c d e f g h
#> 1 10 30 5 5 10 260 1326 6653
#> 2 10 50 5 5 10 260 1326 6653
#> 3 20 60 5 5 20 260 1326 6653
#> 4 20 20 1 3 20 260 1326 6653
#> 5 30 10 10 5 30 260 1326 6653
Expected results:
#> a b c d e f g h
#> 1 10 30 5 5 10 40 45 50
#> 2 10 50 5 5 10 60 65 70
#> 3 20 60 5 5 20 80 85 90
#> 4 20 20 1 3 20 40 41 44
#> 5 30 10 10 5 30 40 50 55
Upvotes: 1
Views: 67
Reputation: 4993
Unless I misunderstood you, I think it is simpler than you have made things:
df <- df %>%
mutate(e = a,
f = a + b,
g = f + c,
h = g + d)
The secret is that the makers of the data.frame
class and its methods have built into it the vector-wise math capabilities. So, by adding two columns together it automatically does what you are hoping for.
Upvotes: 1
Reputation: 388907
You could take rowSums
incrementing one column at a time.
df[letters[5:8]] <- do.call(cbind, lapply(seq(ncol(df)),
function(x) rowSums(df[1:x])))
df
# a b c d e f g h
#1 10 30 5 5 10 40 45 50
#2 10 50 5 5 10 60 65 70
#3 20 60 5 5 20 80 85 90
#4 20 20 1 3 20 40 41 44
#5 30 10 10 5 30 40 50 55
Or if you are interested in a tidyverse
solution :
library(dplyr)
df %>%
bind_cols(purrr::map_dfc(seq(ncol(df)),
~df %>% select(1:.x) %>% rowSums) %>% setNames(letters[5:8]))
Upvotes: 1