Bar-Tzur
Bar-Tzur

Reputation: 85

Sum a group of columns by row count

I'm trying to create a new dataset from an existing one. The new dataset is supposed to combine 60 rows from the original dataset in order to convert a sum of events occurring each second to the total by minute. The number of columns will generally not be known in advance.

For example, with this dataset, if we split it into groups of 3 rows:

d1
  a b c d
1 1 1 0 1
2 0 1 0 1
3 0 1 0 0
4 0 0 1 0
5 0 0 1 0
6 1 0 0 0

We'll get this data.frame. Row 1 contains the column sums for rows 1-3 of d1 and Row 2 contains the column sums for rows 4-6 of d1:

d2
  a b c d
1 1 3 0 2
2 1 0 2 0

I've tried d2<-colSums(d1[seq(1,NROW(d1),3),]) which is about as close as I've been able to get.

I've also considered recommendations from How to sum rows based on multiple conditions - R?,How to select every xth row from table,Remove last N rows in data frame with the arbitrary number of rows,sum two columns in R, and Merging multiple rows into single row. I'm all out of ideas. Any help would be greatly appreciated.

Upvotes: 1

Views: 581

Answers (2)

Cristian E. Nuno
Cristian E. Nuno

Reputation: 2920

Overview

After reading Split up a dataframe by number of rows, I realized the only thing you need to know is how you'd like to split() d1.

In this case, you'd like to split d1 into multiple data frames based on every 3 rows. In this case, you use rep() to specify that you'd like each element in the sequence - 1:2 - to be repeated three times (the number of rows divided by the length of your sequence).

After that, the logic involves using map() to sum each column for each data frame created after d1 %>% split(). Here, summarize_all() is helpful since you don't need to know the column names ahead of time.

Once the calculations are complete, you use bind_rows() to stack all the observations back into one data frame.

# load necessary package ----
library(tidyverse)

# load necessary data ----
df1 <-
  read.table(text = "a b c d
1 1 0 1
0 1 0 1
0 1 0 0
0 0 1 0
0 0 1 0
1 0 0 0", header = TRUE)

# perform operations --------
df2 <-
  df1 %>%
  # split df1 into two data frames
  # based on three consecutive rows
  split(f = rep(1:2, each = nrow(.) / length(1:2))) %>%
  # for each data frame, apply the sum() function to all the columns
  map(.f = ~ .x %>% summarize_all(.funs = funs(sum))) %>%
  # collapse data frames together
  bind_rows()

# view results -----
df2
#   a b c d
# 1 1 3 0 2
# 2 1 0 2 0

# end of script #

Upvotes: 1

Rich Pauloo
Rich Pauloo

Reputation: 8412

Create a grouping variable, group_by that variable, then summarise_all.

# your data
d <- data.frame(a = c(1,0,0,0,0,1),
                b = c(1,1,1,0,0,0),
                c = c(0,0,0,1,1,1),
                d = c(1,1,0,0,0,0))

# create the grouping variable 
d$group <- rep(c("A","B"), each = 3)

# apply the mean to all columns
library(dplyr)
d %>% 
  group_by(group) %>% 
  summarise_all(funs(sum))

Returns:

# A tibble: 2 x 5
  group     a     b     c     d
  <chr> <dbl> <dbl> <dbl> <dbl>
1 A         1     3     0     2
2 B         1     0     3     0

Upvotes: 1

Related Questions