ddply in R: for each group, find the percentage of occurrence for a particular variable

Question

I have a dataset which contains two columns, user_type, and lag response time (in days):

          user_type imp_date lag 
           Consumer 20130613   1  
           Consumer 20130612   2  
           Consumer 20130611   3  
           Consumer 20130612   3  
           Producer 20130610  10  
           Producer 20130614   5  
           Producer 20130613   7

I would like to calculate for the percentage break down of lag for EACH user_type. Here is an example of the output I would like:

user_type        lag    percentage
---------        ---    ----------
Consumer         1      0.25
Consumer         2      0.25
Consumer         3      0.5
Producer         5      0.333
Producer         7      0.333
Producer         10     0.333

The percentage breakdown of lag time response is calculated with respect to the total of each user_type group.

Specifically, I would like to use ddply in pylr, and I have something along the line like:

a = ddply(data, .(user_type), summarize, table(lag)/length(lag))

but it's not giving me the lag time response column.

p.s. My original motivation was to plot these lag distribution for different user type, and I have:

p <- ggplot(data, aes(x = lag, fill = factor(user_type))) 
p + geom_bar(aes(y = (..count..)/sum(..count..)))

but it seems like the percentage breakdown for lag for each user_type is incorrect (i.e. The percentage was calculated with respect to each of the lag group, not user_type group). As a result, I decided to transform my dataset before plotting, if there is an easier way, please share.

Thanks!

ddply in R: for each group, find the percentage of occurrence for a particular variable

Answers (1)

Related Questions