Reputation: 45
I'm new to R, and would like to modify a dataset so that each column contains the cumulative sum of the values in all the columns to its left (including itself). I'm aware of how to use rowSums
to calculate the cumulative sum for each column separately:
df <- data.frame(
jan = rep(1:2, each = 3),
feb = rep(1:3, each = 2),
mar = rep(5:4, each = 3),
apr = rep(1:3, each = 2)
)
df
df %>%
mutate(feb = rowSums(subset(., select = (jan:feb))),
mar = rowSums(subset(., select = (jan:mar))),
apr = rowSums(subset(., select = (jan:apr))))
Which produces the output I'm looking for:
jan feb mar apr
1 1 2 7 8
2 1 2 7 8
3 1 3 8 10
4 2 4 8 10
5 2 5 9 12
6 2 5 9 12
How can I generalise this to any number of columns? I have been trying statements like this:
df %>% mutate_at(vars(-jan), ~rowSums(subset(., select = (jan:.))))
But I am not using subset
correctly. Thanks in advance if you are able to help at all.
Upvotes: 2
Views: 701
Reputation: 459
It's a litle late in the game, but if you want to keep within the tidyverse syntax, you can use a combination of pivoting to a longer format, sum by group, and then reconstitute the wider format:
df %>%
rowid_to_column("ID") %>% #Create a ID column
pivot_longer(cols = - ID) %>%
group_by(ID) %>% #Inteify rows as groups
mutate(CumSum = cumsum(value)) %>% #Do the cumsum by groups
pivot_wider(id_cols = ID, names_from = name, values_from = CumSum) #Reconstruct the wider format
Cheers
Upvotes: 1
Reputation: 887831
Another option with rowCumsums
from matrixStats
library(matrixStats)
rowCumsums(as.matrix(df))
Upvotes: 1
Reputation: 73602
apply
ing cumsum
.
t(apply(df, 1, cumsum))
# jan feb mar apr
# [1,] 1 2 7 8
# [2,] 1 2 7 8
# [3,] 1 3 8 10
# [4,] 2 4 8 10
# [5,] 2 5 9 12
# [6,] 2 5 9 12
Upvotes: 1
Reputation: 24878
Here's an alternative with Reduce
:
do.call(cbind,Reduce(`+`,lapply(df,`[`,),accumulate = TRUE))
[,1] [,2] [,3] [,4]
[1,] 1 2 7 8
[2,] 1 2 7 8
[3,] 1 3 8 10
[4,] 2 4 8 10
[5,] 2 5 9 12
[6,] 2 5 9 12
Upvotes: 1
Reputation: 314
I understood that the columns should be cumulatively summed. Like this:
cum.df = sapply(1:ncol(df), function(col){
rowSums(df[1:col])
})
[,1] [,2] [,3] [,4]
[1,] 1 2 7 8
[2,] 1 2 7 8
[3,] 1 3 8 10
[4,] 2 4 8 10
[5,] 2 5 9 12
[6,] 2 5 9 12
Is that correct?
Upvotes: 1
Reputation: 4358
Its not clear what you are asking, you should provide an example output. Does this help?
> cumsum(colSums(df))
jan feb mar apr
9 21 48 60
or maybe this?
new_df <- df
for(i in 1:nrow(df)){
new_df[i,] <- cumsum(unlist(df[i,]))
}
> new_df
jan feb mar apr
1 1 2 7 8
2 1 2 7 8
3 1 3 8 10
4 2 4 8 10
5 2 5 9 12
6 2 5 9 12
Upvotes: 1