Reputation: 4243
I have a dataframe as such:
Col1 Col2 Col3
partner1 A 20
partner1 B 70
partner2 A 30
partner2 C 20
partner3 B 50
partner3 C 40
How do I transform this so that it aggregates by just Col1 and then shows the percentages from Col2 in a new column:
Col1 Col3 Col4
partner1 90 A: 22.2%, B: 77.7%
partner2 50 A: 60.0%, C: 40.0%
partner3 90 B: 55.6%, C: 44.4%
Thanks!
Upvotes: 2
Views: 52
Reputation: 12723
using prop.table
for Col3
.
library('data.table')
df1[, .(col3 = sum(Col3), Col4 = list( setNames(prop.table(Col3)*100, unique(Col2)))), by = 'Col1']
# Col1 col3 Col4
# 1: partner1 90 22.22222,77.77778
# 2: partner2 50 60,40
# 3: partner3 90 55.55556,44.44444
str(df1[, .(col3 = sum(Col3), Col4 = list( setNames(prop.table(Col3)*100, unique(Col2)))), by = 'Col1'])
# Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
# $ Col1: chr "partner1" "partner2" "partner3"
# $ col3: int 90 50 90
# $ Col4:List of 3
# ..$ : Named num 22.2 77.8
# .. ..- attr(*, "names")= chr "A" "B"
# ..$ : Named num 60 40
# .. ..- attr(*, "names")= chr "A" "C"
# ..$ : Named num 55.6 44.4
# .. ..- attr(*, "names")= chr "B" "C"
# - attr(*, ".internal.selfref")=<externalptr>
Upvotes: 1
Reputation: 32558
Here's a solution with base R. It involves using split
based on grouping variable and then summarizing results from the sub-groups.
do.call(rbind, lapply(split(df, df$Col1), function(a)
cbind(a$Col1[1], sum(a[,3]), paste(sapply(split(a, a$Col2), function(b)
paste(b$Col2[1],":",round(100*sum(b[,3])/sum(a[,3]),2),"%", sep = "")
), collapse = " "))))
# [,1] [,2] [,3]
#[1,] "partner1" "90" "A:22.22% B:77.78%"
#[2,] "partner2" "50" "A:60% C:40%"
#[3,] "partner3" "90" "B:55.56% C:44.44%"
Upvotes: 1
Reputation: 215127
You need summarize
, and format the string properly, here is an option with sprintf
, assuming Col2
is unique within each group of Col1
:
df %>%
group_by(Col1) %>%
# use %s: %.1f%% to format Col2 as string, and the percentage as float rounded to one
# decimal place %.1f and the percentage symbol %%
summarise(Col4 = toString(sprintf("%s: %.1f%%", Col2, Col3 * 100/sum(Col3))),
Col3 = sum(Col3))
# A tibble: 3 × 3
# Col1 Col4 Col3
# <fctr> <chr> <int>
#1 partner1 A: 22.2%, B: 77.8% 90
#2 partner2 A: 60.0%, C: 40.0% 50
#3 partner3 B: 55.6%, C: 44.4% 90
Upvotes: 3