Reputation: 847
I am trying to find topic proportions for each ICPSR. The data looks like this.
ICPSR date day month year mention topic
169538 15444 2009-06-02 2 June 2009 1 18
169544 15444 2010-03-02 2 March 2010 1 20
169581 15444 2010-09-30 30 September 2010 1 18
169609 15444 2009-06-03 3 June 2009 1 1
169729 20909 2009-11-17 17 November 2009 1 9
169791 29317 2009-03-13 13 March 2009 1 13
I am trying to find: for each ICPSR, what is the proportions of each topic. To show my wanted output, it looks like following.
ICPSR topic.1 topic.9 topic.13 topic.18 topic.20
1 15444 0.25 0 0 0.5 0.25
2 20909 0 1 0 0 0
3 29317 0 0 1 0 0
I was trying to use ddply
, like:
ddply(c.analyze1, c("ICPSR"), summarize, sum(mention)))/ddply(c.analyze1, c("ICPSR","topic"), summarize, sum(mention)))
. But this doesn't find my wanted output.
I hope can have any command or code suggestions. Thank you!
Upvotes: 1
Views: 210
Reputation: 99331
You don't really need ddply
for this. You can use prop.table
.
If df
is your data,
prop.table(table(df$ICPSR, df$topic), 1)
#
# 1 9 13 18 20
# 15444 0.25 0.00 0.00 0.50 0.25
# 20909 0.00 1.00 0.00 0.00 0.00
# 29317 0.00 0.00 1.00 0.00 0.00
Upvotes: 2