Reputation: 1013
I want to write a function that can aggregate a tibble by combining some rows and columns. Here is a small example (not as a function) that aggregates the columns A1+A3 and the rows 1 and 3
A <- tibble(A = c("A1", "A2", "A3", "A4") ,
A1 = c(1,2,5,3),
A2 = c(2,4,2,5),
A3 = c(3,7,1,6),
A4 = c(1,3,2,4))
AGG <- A %>%
mutate(A1 = A1 + A3) %>%
select(-A3)
AGG[1, 2:4] = AGG[1,2:4] + AGG[3,2:4]
AGG <- AGG[c(1:2,4),]
The function should have as arguments the column numbers (equal to row numbers). For the example above this would be:
AGG <- aggfunc(A, c(1,3))
There might be a straightforward way, but all I can find until now are rather complicated solutions that do not work. Any help is appreciated.
Cheers Renger
Upvotes: 1
Views: 502
Reputation: 18561
Here is one possible dplyr
approach. It assumes that your identifying column is the first column of the data.frame
. We can see that for this kind of problem a base R approach is much simpler.
library(dplyr)
# your data
A <- tibble(A = c("A1", "A2", "A3", "A4") ,
A1 = c(1,2,5,3),
A2 = c(2,4,2,5),
A3 = c(3,7,1,6),
A4 = c(1,3,2,4))
# the new function
aggfunc <- function(df, indices) {
new_colnm <- names(df)[indices[1] + 1]
id_colnm = names(df)[1]
# add columns
temp <- df %>%
rowwise %>%
mutate(!! new_colnm := sum(c_across(indices+1))) %>%
select(!indices[-1]+1) %>%
ungroup
# add rows
temp %>%
rows_update(
tibble(!! id_colnm := new_colnm,
temp %>%
filter(row_number() %in% indices) %>%
summarise(across(indices[1]+1, sum))
)) %>%
slice(-indices[-1])
}
A %>%
aggfunc(c(1:3))
#> Matching, by = "A"
#> # A tibble: 2 x 3
#> A A1 A4
#> <chr> <dbl> <dbl>
#> 1 A1 27 1
#> 2 A4 14 4
Created on 2020-11-10 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 1013
Using the first answer (as it allows to choose more than two column/rows to be aggregated, I adjusted the function slightly, so I don't loose the first column:
aggfunc <- function(A, ind) {
target_col <- ind[1]
to_delete <- ind[-target_col]
newFirst <- A$A[-to_delete]
colnFirst <- colnames(A)[1]
AGG <- select(A, -A)
AGG[, target_col] <- rowSums(AGG[, find])
lastcol <- dim(AGG)[2]
AGG[target_col, ] <- as.list(colSums(AGG[ind,]))
AGG[-to_delete, -to_delete] %>%
add_column(newFirst, .before = .1) %>%
rename({{ colnFirst }} := newFirst)
}
Thanks to both Bas and TimTeaFan for their help. I learned a lot again.
Upvotes: 0
Reputation: 4658
I believe the following does what you want. Since it filters on row and column indices, I throw away the column A
before calling the function.
library(dplyr)
A <- tibble(A = c("A1", "A2", "A3", "A4") ,
A1 = c(1,2,5,3),
A2 = c(2,4,2,5),
A3 = c(3,7,1,6),
A4 = c(1,3,2,4))
aggfunc <- function(A, ind) {
target_col <- ind[1]
to_delete <- ind[-target_col]
AGG <- A
AGG[, target_col] <- rowSums(AGG[, ind])
AGG[target_col, ] <- as.list(colSums(AGG[ind, ]))
AGG[-to_delete, -to_delete]
}
aggfunc(select(A, -A), c(1,3))
which gives
# A tibble: 3 x 3
A1 A2 A4
<dbl> <dbl> <dbl>
1 10 4 3
2 9 4 3
3 9 5 4
Upvotes: 1