Reputation: 21
I have a data frame of the form below:
ID <- c(1, 2, 3, 4, 5)
Type1 <- c("A", "", "A", "B", "C")
Count1 <- c(40, NA, 10, 5, 100)
Type2 <- c("D", "", "", "C", "D")
Count2 <- c(5, NA, NA, 30, 5)
Type3 <- c("E", "", "", "D", "")
Count3 <- c(10, NA, NA, 5, NA)
df <- data.frame(ID, Type1, Count1, Type2, Count2, Type3, Count3)
I would like to sum the values in the "Count" columns IF they are of the same "Type". I.e., if Type1, Type2, or Type3 match, sum the corresponding value in Count1, Count2, and Count3.
Ideally, I could get an output of the form below:
Type <- c("A", "B", "C", "D", "E")
n <- c(2, 1, 2, 3, 1)
Total <- c(50, 5, 130, 15, 10)
result <- data.frame(Type, n, Total)
I was able to achieve this using the following code, but it's quite clunky. I'm sure there is a more elegant method!
df1 <- data.frame(Type1, Count1)
df2 <- data.frame(Type2, Count2)
df3 <- data.frame(Type3, Count3)
colnames(df1) <- c("Type", "Count")
colnames(df2) <- c("Type", "Count")
colnames(df3) <- c("Type", "Count")
df_all <- rbind(df1, df2, df3)
result <- df_all %>% group_by(Type) %>%
summarize(num = n(),
total = sum(Count))
Upvotes: 2
Views: 58
Reputation: 73702
reshape
long, then aggregate
.
> df |>
+ reshape(idvar='ID', varying=2:7, sep='', direction='long') |>
+ aggregate(Count ~ Type, \(x) c(n=length(x), Total=sum(x))) |>
+ do.call(what='data.frame') ## needed to get rid of matrix column
Type Count.n Count.Total
1 A 2 50
2 B 1 5
3 C 2 130
4 D 3 15
5 E 1 10
Upvotes: 0
Reputation: 18732
This is easily done if your data are in long format. You can get it into long format using pivot_longer()
from the tidyr
package. This will work for any number of columns (e.g. Type1, Type2, Type3, ... and Count1, Count2, Count3, ...)
library(tidyr)
library(dplyr)
df |>
pivot_longer(cols = matches("^(Count|Type)"),
names_pattern = "(^\\w+)(\\d+)", names_to = c(".value", NA)) |>
summarize(n = n(), Total = sum(Count, na.rm = TRUE), .by = Type) |>
filter(nzchar(Type))
# Type n Total
# <chr> <int> <dbl>
# 1 A 2 50
# 2 D 3 15
# 3 E 1 10
# 4 B 1 5
# 5 C 2 130
Upvotes: 3
Reputation: 500
Here is a quick way by reorganizing the data frame slightly before using summarize
.
result <- bind_rows(select(df, Type = Type1, Count = Count1),
select(df, Type = Type2, Count = Count2),
select(df, Type = Type3, Count = Count3)) %>%
group_by(Type) %>%
summarize(n = n(),
Total = sum(Count))
result
> # A tibble: 6 × 3
> Type n Total
> <chr> <int> <dbl>
> 1 "" 6 NA
> 2 "A" 2 50
> 3 "B" 1 5
> 4 "C" 2 130
> 5 "D" 3 15
> 6 "E" 1 10
You can also convert to a data frame and remove the first (blank) column:
result <- result %>% filter(Type != "") %>%
as.data.frame()
result
> Type n Total
> 1 A 2 50
> 2 B 1 5
> 3 C 2 130
> 4 D 3 15
> 5 E 1 10
Upvotes: 0