arnyeinstein
arnyeinstein

Reputation: 1013

How to aggregate a tibble by rows and columns

I want to write a function that can aggregate a tibble by combining some rows and columns. Here is a small example (not as a function) that aggregates the columns A1+A3 and the rows 1 and 3

A <- tibble(A = c("A1", "A2", "A3", "A4") ,
            A1 = c(1,2,5,3), 
            A2 = c(2,4,2,5), 
            A3 = c(3,7,1,6), 
            A4 = c(1,3,2,4))
AGG <- A %>% 
  mutate(A1 = A1 + A3) %>% 
  select(-A3)
AGG[1, 2:4]  = AGG[1,2:4] + AGG[3,2:4]
AGG <- AGG[c(1:2,4),]

The function should have as arguments the column numbers (equal to row numbers). For the example above this would be:

AGG <- aggfunc(A, c(1,3))

There might be a straightforward way, but all I can find until now are rather complicated solutions that do not work. Any help is appreciated.

Cheers Renger

Upvotes: 1

Views: 502

Answers (3)

TimTeaFan
TimTeaFan

Reputation: 18561

Here is one possible dplyr approach. It assumes that your identifying column is the first column of the data.frame. We can see that for this kind of problem a base R approach is much simpler.

library(dplyr)

# your data
A <- tibble(A = c("A1", "A2", "A3", "A4") ,
            A1 = c(1,2,5,3), 
            A2 = c(2,4,2,5), 
            A3 = c(3,7,1,6), 
            A4 = c(1,3,2,4))

# the new function
aggfunc <- function(df, indices) {

  new_colnm <- names(df)[indices[1] + 1]
  id_colnm = names(df)[1]
  
  # add columns
  temp <- df %>% 
    rowwise %>% 
    mutate(!! new_colnm := sum(c_across(indices+1))) %>%
    select(!indices[-1]+1) %>% 
    ungroup

  # add rows
  temp %>%
    rows_update(
      tibble(!! id_colnm := new_colnm,
             temp %>%
               filter(row_number() %in% indices) %>%
               summarise(across(indices[1]+1, sum))
    )) %>%
    slice(-indices[-1])
  
}

A %>% 
  aggfunc(c(1:3))

#> Matching, by = "A"
#> # A tibble: 2 x 3
#>   A        A1    A4
#>   <chr> <dbl> <dbl>
#> 1 A1       27     1
#> 2 A4       14     4

Created on 2020-11-10 by the reprex package (v0.3.0)

Upvotes: 1

arnyeinstein
arnyeinstein

Reputation: 1013

Using the first answer (as it allows to choose more than two column/rows to be aggregated, I adjusted the function slightly, so I don't loose the first column:

aggfunc <- function(A, ind) {
   target_col <- ind[1]
   to_delete <- ind[-target_col]
   newFirst <- A$A[-to_delete]
   colnFirst <- colnames(A)[1]
   AGG <- select(A, -A)

   AGG[, target_col] <- rowSums(AGG[, find])
   lastcol <- dim(AGG)[2]
   AGG[target_col, ] <- as.list(colSums(AGG[ind,]))

   AGG[-to_delete, -to_delete] %>% 
      add_column(newFirst, .before = .1) %>% 
      rename({{ colnFirst }} := newFirst)

}

Thanks to both Bas and TimTeaFan for their help. I learned a lot again.

Upvotes: 0

Bas
Bas

Reputation: 4658

I believe the following does what you want. Since it filters on row and column indices, I throw away the column A before calling the function.

library(dplyr)

A <- tibble(A = c("A1", "A2", "A3", "A4") ,
            A1 = c(1,2,5,3), 
            A2 = c(2,4,2,5), 
            A3 = c(3,7,1,6), 
            A4 = c(1,3,2,4))

aggfunc <- function(A, ind) {
  target_col <- ind[1]
  to_delete <- ind[-target_col]
  
  AGG <- A
  
  AGG[, target_col] <- rowSums(AGG[, ind])
  AGG[target_col, ] <- as.list(colSums(AGG[ind, ]))
  
  AGG[-to_delete, -to_delete]
}

aggfunc(select(A, -A), c(1,3))

which gives

# A tibble: 3 x 3
     A1    A2    A4
  <dbl> <dbl> <dbl>
1    10     4     3
2     9     4     3
3     9     5     4

Upvotes: 1

Related Questions