Reputation: 21

Conditional sum of multiple columns based on multiple (other) columns

I have a data frame of the form below:

ID <- c(1, 2, 3, 4, 5)
Type1 <- c("A", "", "A", "B", "C")
Count1 <- c(40, NA, 10, 5, 100)
Type2 <- c("D", "", "", "C", "D")
Count2 <- c(5, NA, NA, 30, 5)
Type3 <- c("E", "", "", "D", "")
Count3 <- c(10, NA, NA, 5, NA)
df <- data.frame(ID, Type1, Count1, Type2, Count2, Type3, Count3)

I would like to sum the values in the "Count" columns IF they are of the same "Type". I.e., if Type1, Type2, or Type3 match, sum the corresponding value in Count1, Count2, and Count3.

Ideally, I could get an output of the form below:

Type <- c("A", "B", "C", "D", "E")
n <- c(2, 1, 2, 3, 1)
Total <- c(50, 5, 130, 15, 10)

result <- data.frame(Type, n, Total)

I was able to achieve this using the following code, but it's quite clunky. I'm sure there is a more elegant method!

df1 <- data.frame(Type1, Count1)
df2 <- data.frame(Type2, Count2)
df3 <- data.frame(Type3, Count3)

colnames(df1) <- c("Type", "Count")
colnames(df2) <- c("Type", "Count")
colnames(df3) <- c("Type", "Count")

df_all <- rbind(df1, df2, df3)

result <- df_all %>% group_by(Type) %>% 
     summarize(num = n(),
               total = sum(Count))

Upvotes: 2

Answers (3)

jay.sf

Reputation: 73702

reshape long, then aggregate.

> df |> 
+   reshape(idvar='ID', varying=2:7, sep='', direction='long') |> 
+   aggregate(Count ~ Type, \(x) c(n=length(x), Total=sum(x))) |> 
+   do.call(what='data.frame')  ## needed to get rid of matrix column
  Type Count.n Count.Total
1    A       2          50
2    B       1           5
3    C       2         130
4    D       3          15
5    E       1          10

Upvotes: 0

LMc

Reputation: 18732

This is easily done if your data are in long format. You can get it into long format using pivot_longer() from the tidyr package. This will work for any number of columns (e.g. Type1, Type2, Type3, ... and Count1, Count2, Count3, ...)

library(tidyr)
library(dplyr)

df |>
  pivot_longer(cols = matches("^(Count|Type)"),
               names_pattern = "(^\\w+)(\\d+)", names_to = c(".value", NA)) |>
  summarize(n = n(), Total = sum(Count, na.rm = TRUE), .by = Type) |>
  filter(nzchar(Type))
#   Type      n Total
#   <chr> <int> <dbl>
# 1 A         2    50
# 2 D         3    15
# 3 E         1    10
# 4 B         1     5
# 5 C         2   130

Upvotes: 3

apple

Reputation: 500

Here is a quick way by reorganizing the data frame slightly before using summarize.

result <- bind_rows(select(df, Type = Type1, Count = Count1),
                    select(df, Type = Type2, Count = Count2),
                    select(df, Type = Type3, Count = Count3)) %>%

  group_by(Type) %>%
  summarize(n = n(),
            Total = sum(Count))

result
> # A tibble: 6 × 3
>   Type      n Total
>   <chr> <int> <dbl>
> 1 ""        6    NA
> 2 "A"       2    50
> 3 "B"       1     5
> 4 "C"       2   130
> 5 "D"       3    15
> 6 "E"       1    10

You can also convert to a data frame and remove the first (blank) column:

result <- result %>% filter(Type != "") %>%
  as.data.frame()

result
>   Type n Total
> 1    A 2    50
> 2    B 1     5
> 3    C 2   130
> 4    D 3    15
> 5    E 1    10

Upvotes: 0

Conditional sum of multiple columns based on multiple (other) columns

Answers (3)

Related Questions