Rob F
Rob F

Reputation: 45

How to sum cumulatively across columns

I'm new to R, and would like to modify a dataset so that each column contains the cumulative sum of the values in all the columns to its left (including itself). I'm aware of how to use rowSums to calculate the cumulative sum for each column separately:

df <- data.frame(
  jan = rep(1:2, each = 3),
  feb = rep(1:3, each = 2),
  mar = rep(5:4, each = 3),
  apr = rep(1:3, each = 2)
)
df

df %>%
  mutate(feb = rowSums(subset(., select = (jan:feb))),
         mar = rowSums(subset(., select = (jan:mar))),
         apr = rowSums(subset(., select = (jan:apr))))

Which produces the output I'm looking for:

  jan feb mar apr
1   1   2   7   8
2   1   2   7   8
3   1   3   8  10
4   2   4   8  10
5   2   5   9  12
6   2   5   9  12

How can I generalise this to any number of columns? I have been trying statements like this:

df %>% mutate_at(vars(-jan), ~rowSums(subset(., select = (jan:.))))

But I am not using subset correctly. Thanks in advance if you are able to help at all.

Upvotes: 2

Views: 701

Answers (6)

mferreira
mferreira

Reputation: 459

It's a litle late in the game, but if you want to keep within the tidyverse syntax, you can use a combination of pivoting to a longer format, sum by group, and then reconstitute the wider format:

df %>% 
  rowid_to_column("ID") %>%  #Create a ID column
  pivot_longer(cols = - ID) %>% 
  group_by(ID) %>% #Inteify rows as groups
  mutate(CumSum = cumsum(value)) %>% #Do the cumsum by groups
  pivot_wider(id_cols = ID, names_from = name, values_from = CumSum) #Reconstruct the wider format

Cheers

Upvotes: 1

akrun
akrun

Reputation: 887831

Another option with rowCumsums from matrixStats

library(matrixStats)
rowCumsums(as.matrix(df))

Upvotes: 1

jay.sf
jay.sf

Reputation: 73602

applying cumsum.

t(apply(df, 1, cumsum))
#      jan feb mar apr
# [1,]   1   2   7   8
# [2,]   1   2   7   8
# [3,]   1   3   8  10
# [4,]   2   4   8  10
# [5,]   2   5   9  12
# [6,]   2   5   9  12

Upvotes: 1

Ian Campbell
Ian Campbell

Reputation: 24878

Here's an alternative with Reduce:

do.call(cbind,Reduce(`+`,lapply(df,`[`,),accumulate = TRUE))
     [,1] [,2] [,3] [,4]
[1,]    1    2    7    8
[2,]    1    2    7    8
[3,]    1    3    8   10
[4,]    2    4    8   10
[5,]    2    5    9   12
[6,]    2    5    9   12

Upvotes: 1

randr
randr

Reputation: 314

I understood that the columns should be cumulatively summed. Like this:

cum.df = sapply(1:ncol(df), function(col){
    rowSums(df[1:col])
})

     [,1] [,2] [,3] [,4]
[1,]    1    2    7    8
[2,]    1    2    7    8
[3,]    1    3    8   10
[4,]    2    4    8   10
[5,]    2    5    9   12
[6,]    2    5    9   12

Is that correct?

Upvotes: 1

Daniel O
Daniel O

Reputation: 4358

Its not clear what you are asking, you should provide an example output. Does this help?

> cumsum(colSums(df))
jan feb mar apr 
  9  21  48  60 

or maybe this?

new_df <- df

for(i in 1:nrow(df)){
    new_df[i,] <- cumsum(unlist(df[i,]))
}

> new_df
  jan feb mar apr
1   1   2   7   8
2   1   2   7   8
3   1   3   8  10
4   2   4   8  10
5   2   5   9  12
6   2   5   9  12

Upvotes: 1

Related Questions