Reputation: 71
This is my dataframe:
df_test <- data.frame(Report=c("A-1","A-2","A-3","A-4","B-1","B-2"),grp=c("A","A","A","A","B","B"),defect=c("m","m","m","c","d","e"))
For each grp, I want to figure out the number of reports, and which defect contributes to the max value of the grp, and if there is a tie then I would like to output a collapsed value of the columns that tie.
Expected output:
df_out<-data.frame(grp=c("A","B"),defect_list=c("m;c","d;e"),nreports=c(4,2),max_defect_list=c("m","d;e"),max_defect=c(3,1))
Upvotes: 1
Views: 52
Reputation: 887088
Using data.table
, we can also do
library(data.table)
setDT(df_test)[, n := .N, .(grp, defect)][,
.(defect_list = paste(unique(defect), collapse=";"),
nreports = .N,
max_defect_list = paste(unique(defect[which(n == max(n))]), collapse = ';'),
max_defect = max(n)), .(grp)]
# grp defect_list nreports max_defect_list max_defect
#1: A m;c 4 m 3
#2: B d;e 2 d;e 1
Upvotes: 1
Reputation: 14764
One way would be:
library(dplyr)
df_test %>%
group_by(grp) %>%
add_count(defect) %>%
summarise(
defect_list = paste(unique(defect), collapse = ';'),
nreports = n(),
max_defect_list = paste(unique(defect[which(n == max(n))]), collapse = ';'),
max_defect = max(n)
)
Output:
# A tibble: 2 x 5
grp defect_list nreports max_defect_list max_defect
<fct> <chr> <int> <chr> <int>
1 A m;c 4 m 3
2 B d;e 2 d;e 1
Upvotes: 2