Transform frequencies in factor levels into percentages

Question

Consider the following data frame:

x <-c(rep (c ("s1", "s2", "s3"),each=5 ))
y <- c(rep(c("a", "b", "c", "d", "e"), 3) )               
z<-c(1:15)    

x_name <- "dimensions"
y_name <- "aspects"
z_name<-"value"
df <- data.frame(x,y,z)
names(df) <- c(x_name,y_name, z_name)

How I can calculate and add a new column of percentages for each factor level? In this case percentage of 'value' for each 'dimension' across 'aspects' so that I have something like this:

To illustrate: 7 in percentage represents the percentage of value 1 taken from the total of values under aspects corresponding to dimension s1 (15 in this case) etc.

I searched the forum, but the answers in creating percentages go only across one column and not two. I finally produced this column, by first calculating in a separate df total value in aspects across dimensions, using aggregate. Then I merged the initial df with the new one with merge and created a percentage column by dividing the value column to total value. But I feel that this is very clumsy. Is there a simpler way to do this?

josliber · Accepted Answer

In base R you can use ave to do these sorts of grouped calculations:

df$percentage <- ave(df$value, df$dimensions, FUN=function(x) 100*x/sum(x))
df
#    dimensions aspects value percentage
# 1          s1       a     1   6.666667
# 2          s1       b     2  13.333333
# 3          s1       c     3  20.000000
# 4          s1       d     4  26.666667
# 5          s1       e     5  33.333333
# 6          s2       a     6  15.000000
# 7          s2       b     7  17.500000
# 8          s2       c     8  20.000000
# 9          s2       d     9  22.500000
# 10         s2       e    10  25.000000
# 11         s3       a    11  16.923077
# 12         s3       b    12  18.461538
# 13         s3       c    13  20.000000
# 14         s3       d    14  21.538462
# 15         s3       e    15  23.076923

In dplyr you could use group_by and mutate:

library(dplyr)
df %>% group_by(dimensions) %>% mutate(percentage=100*value/sum(value))
# Source: local data frame [15 x 4]
# Groups: dimensions [3]
# 
#    dimensions aspects value percentage
#        (fctr)  (fctr) (int)      (dbl)
# 1          s1       a     1   6.666667
# 2          s1       b     2  13.333333
# 3          s1       c     3  20.000000
# 4          s1       d     4  26.666667
# 5          s1       e     5  33.333333
# 6          s2       a     6  15.000000
# 7          s2       b     7  17.500000
# 8          s2       c     8  20.000000
# 9          s2       d     9  22.500000
# 10         s2       e    10  25.000000
# 11         s3       a    11  16.923077
# 12         s3       b    12  18.461538
# 13         s3       c    13  20.000000
# 14         s3       d    14  21.538462
# 15         s3       e    15  23.076923

Any sort of rounding desired could be performed by wrapping the percentage calculation in the round function and passing the desired precision.

Transform frequencies in factor levels into percentages

Answers (2)

Related Questions