user5424264
user5424264

Reputation: 105

Sum of every x rows in df R, ouput only in xth row. R

I have a df:

  df <- data.frame(x = c(1,2,3,4,5,6,7,8,9,10,11,12,13),
                   y = c(0,0,2,0,1,0,0,0,0,3,0,4,0))

I am looking for the sum of every 4 rows at a precise moment (after moment 4) This should be the output:

x   y   z
1   0   0
2   0   0
3   2   0
4   0   2
5   1   0
6   0   0
7   0   0
8   0   1
9   0   0
10  3   0
11  0   0
12  4   7
13  0   0

With dplyr I was able to create the following code with the following result.

  a <- df %>% 
    dplyr::mutate(b = gl(ceiling(nrow(x)/4), 4, nrow(x))) %>%
    dplyr::group_by(b) %>%
    dplyr::mutate(sum = sum(amount))

x   y   z
1   0   2
2   0   2
3   2   2
4   0   2
5   1   1
6   0   1
7   0   1
8   0   1
9   0   7
10  3   7
11  0   7
12  4   7
13  0   0

But I am not able to remove the numbers and replace them with 0 except for the 4th rows, but the problem is that it repeats if I have for a period 0's.

Upvotes: 2

Views: 204

Answers (3)

akrun
akrun

Reputation: 887118

After grouping by 'b' created with gl, we create the 'z' column by getting the sum of 'y' and multiplying with the logical vector (row_number()==n()) so that except the last row, all other elements become 0

library(dplyr)
df %>% 
    group_by(b = gl(ceiling(n()/4), 4, n())) %>%
    mutate(z = sum(y) * (row_number()== n())) %>%
    ungroup() %>%
    select(-b)
# A tibble: 13 x 3
#       x     y     z
#   <dbl> <dbl> <dbl>
# 1     1     0     0
# 2     2     0     0
# 3     3     2     0
# 4     4     0     2
# 5     5     1     0
# 6     6     0     0
# 7     7     0     0
# 8     8     0     1
# 9     9     0     0
#10    10     3     0
#11    11     0     0
#12    12     4     7
#13    13     0     0

If we need an external package, the efficient RcppRoll can be used and the output can be easily achieved as well

library(RcppRoll)
with(df, round(roll_sumr(y, n = 4, by=4,  fill = 0)))
#[1] 0 0 0 2 0 0 0 1 0 0 0 7 0

Upvotes: 5

d.b
d.b

Reputation: 32548

In base R

df$z = 0
replace(df$z,
        seq_along(df$z)%%4 == 0,
        sapply(split(df$y, floor(seq_along(df$y)/4.01)), sum))
# [1] 0 0 0 2 0 0 0 1 0 0 0 7 0
#Warning message:
#In replace(df$z, seq_along(df$z)%%4 == 0, sapply(split(df$y, #floor(seq_along(df$y)/4.01)),  :
#  number of items to replace is not a multiple of replacement length

Upvotes: 2

Sotos
Sotos

Reputation: 51592

This can be achieved easily with rollapply from zoo package, i.e.

library(zoo)

rollapply(df$y, 4, by = 4, sum, fill = 0, align = 'right')
#[1] 0 0 0 2 0 0 0 1 0 0 0 7 0

Upvotes: 6

Related Questions