anderwyang
anderwyang

Reputation: 2411

In R , how to summarize data frame in multiple dimensions

There is dataframe raw_data as below, How can i change it to wished_data in easy way ?

I currently know group_by/summarise the data serval times (and add variables) , then rbind them. But this is little boring , especially when variables more then this example in occasion.

I want to know ,if is there any general method for similar situation ? Thanks!

library(tidyverse)
country <- c('UK','US','UK','US')
category <- c("A", "B", "A", "B")
y2021 <- c(17, 42, 21, 12)
y2022 <- c(49, 23, 52, 90)

raw_data <- data.frame(country,category,y2021,y2022)

enter image description here

Upvotes: 0

Views: 306

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269371

We can use the adorn_totals function from janitor. get_totals accepts a data frame and a column and it outputs the data frame with totals for the numeric columns, one such row for each level of the specified column. It then extracts out the total rows and since adorn_totals can rearrange the column order uses select to put the order back to the original so that we can later bind mulitiple instances together. We then bind together the orignal data frame and each of the total row data frames that we want.

library(dplyr)
library(janitor)

get_totals <- function(data, col) {
  data %>%
    group_by({{col}}) %>%
    group_modify(~ adorn_totals(.)) %>%
    ungroup %>%
    filter(rowSums(. == "Total") > 0) %>%
    select(any_of(names(data)))
}

bind_rows(
  raw_data,
  get_totals(raw_data, category), 
  get_totals(raw_data, country), 
  get_totals(raw_data)
)

giving:

  country category y2021 y2022
1      UK        A    17    49
2      US        B    42    23
3      UK        A    21    52
4      US        B    12    90
5   Total        A    38   101
6   Total        B    54   113
7      UK    Total    38   101
8      US    Total    54   113
9   Total        -    92   214

Upvotes: 2

akrun
akrun

Reputation: 886938

We may use rollup/cube/groupingsets from data.table

library(data.table)
out <- rbind(setDT(raw_data), groupingsets(raw_data, j = lapply(.SD, sum), 
  by = c("country", "category"), 
    sets = list("country", "category", character())))
out[is.na(out)] <- 'TOTAL'

-output

> out
   country category y2021 y2022
    <char>   <char> <num> <num>
1:      UK        A    17    49
2:      US        B    42    23
3:      UK        A    21    52
4:      US        B    12    90
5:      UK    TOTAL    38   101
6:      US    TOTAL    54   113
7:   TOTAL        A    38   101
8:   TOTAL        B    54   113
9:   TOTAL    TOTAL    92   214

Or with cube

out <- rbind(raw_data, cube(raw_data, 
  j = .(y2021= sum(y2021), y2022=sum(y2022)), by = c("country", "category")))
out[is.na(out)] <- 'TOTAL'

Upvotes: 2

Related Questions