user3077008
user3077008

Reputation: 847

Calculating proportions by using ddply

I am trying to find topic proportions for each ICPSR. The data looks like this.

           ICPSR       date day     month year mention topic
   169538 15444 2009-06-02   2      June 2009       1    18
   169544 15444 2010-03-02   2     March 2010       1    20
   169581 15444 2010-09-30  30 September 2010       1    18
   169609 15444 2009-06-03   3      June 2009       1     1
   169729 20909 2009-11-17  17  November 2009       1     9
   169791 29317 2009-03-13  13     March 2009       1    13

I am trying to find: for each ICPSR, what is the proportions of each topic. To show my wanted output, it looks like following.

     ICPSR  topic.1 topic.9 topic.13 topic.18 topic.20  
   1 15444   0.25      0        0       0.5      0.25
   2 20909    0        1        0        0        0
   3 29317    0        0        1        0        0

I was trying to use ddply, like: ddply(c.analyze1, c("ICPSR"), summarize, sum(mention)))/ddply(c.analyze1, c("ICPSR","topic"), summarize, sum(mention))). But this doesn't find my wanted output.

I hope can have any command or code suggestions. Thank you!

Upvotes: 1

Views: 210

Answers (1)

Rich Scriven
Rich Scriven

Reputation: 99331

You don't really need ddply for this. You can use prop.table.

If df is your data,

prop.table(table(df$ICPSR, df$topic), 1)
#       
#           1    9   13   18   20
#  15444 0.25 0.00 0.00 0.50 0.25
#  20909 0.00 1.00 0.00 0.00 0.00
#  29317 0.00 0.00 1.00 0.00 0.00

Upvotes: 2

Related Questions