Nick Knauer
Nick Knauer

Reputation: 4243

Obtain Percentage of each value and add to next column (after aggregation)

I have a dataframe as such:

    Col1       Col2       Col3            
partner1          A         20      
partner1          B         70
partner2          A         30
partner2          C         20
partner3          B         50
partner3          C         40

How do I transform this so that it aggregates by just Col1 and then shows the percentages from Col2 in a new column:

    Col1       Col3                    Col4            
partner1         90      A: 22.2%, B: 77.7% 
partner2         50      A: 60.0%, C: 40.0%
partner3         90      B: 55.6%, C: 44.4%

Thanks!

Upvotes: 2

Views: 52

Answers (3)

Sathish
Sathish

Reputation: 12723

using prop.table for Col3.

library('data.table')
df1[, .(col3 = sum(Col3), Col4 = list( setNames(prop.table(Col3)*100, unique(Col2)))), by = 'Col1']
#       Col1 col3              Col4
# 1: partner1   90 22.22222,77.77778
# 2: partner2   50             60,40
# 3: partner3   90 55.55556,44.44444

str(df1[, .(col3 = sum(Col3), Col4 = list( setNames(prop.table(Col3)*100, unique(Col2)))), by = 'Col1'])
# Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
#   $ Col1: chr  "partner1" "partner2" "partner3"
# $ col3: int  90 50 90
# $ Col4:List of 3
# ..$ : Named num  22.2 77.8
# .. ..- attr(*, "names")= chr  "A" "B"
# ..$ : Named num  60 40
# .. ..- attr(*, "names")= chr  "A" "C"
# ..$ : Named num  55.6 44.4
# .. ..- attr(*, "names")= chr  "B" "C"
# - attr(*, ".internal.selfref")=<externalptr> 

Upvotes: 1

d.b
d.b

Reputation: 32558

Here's a solution with base R. It involves using split based on grouping variable and then summarizing results from the sub-groups.

do.call(rbind, lapply(split(df, df$Col1), function(a)
    cbind(a$Col1[1], sum(a[,3]), paste(sapply(split(a, a$Col2), function(b)
        paste(b$Col2[1],":",round(100*sum(b[,3])/sum(a[,3]),2),"%", sep = "")
        ), collapse = " "))))
#            [,1]       [,2] [,3]               
#[1,] "partner1" "90" "A:22.22% B:77.78%"
#[2,] "partner2" "50" "A:60% C:40%"      
#[3,] "partner3" "90" "B:55.56% C:44.44%"

Upvotes: 1

akuiper
akuiper

Reputation: 215127

You need summarize, and format the string properly, here is an option with sprintf, assuming Col2 is unique within each group of Col1:

df %>% 
      group_by(Col1) %>% 

      # use %s: %.1f%% to format Col2 as string, and the percentage as float rounded to one 
      # decimal place %.1f and the percentage symbol %%
      summarise(Col4 = toString(sprintf("%s: %.1f%%", Col2, Col3 * 100/sum(Col3))), 
                Col3 = sum(Col3))

# A tibble: 3 × 3
#      Col1               Col4  Col3
#    <fctr>              <chr> <int>
#1 partner1 A: 22.2%, B: 77.8%    90
#2 partner2 A: 60.0%, C: 40.0%    50
#3 partner3 B: 55.6%, C: 44.4%    90

Upvotes: 3

Related Questions