Reputation: 57
I think I am missing something obvious, I have the following data frame
df <- data.frame(type = c("cattle", "mixed", "not stated", "other", "sheep", "cattle", "mixed", "not stated", "other", "sheep", "cattle", "mixed", "not stated", "other", "sheep"),
region = c("EA", "EA", "EA", "EA", "EA", "NW", "NW", "NW", "NW", "NW", "S", "S", "S", "S", "S" ),
number = c(14, 9, 80, 0, 2, 36, 15, 45, 0, 7, 12, 35, 92, 18, 1))
I would like to calculate the proportion of type within each region. I have tried both:
require(plyr)
ddply(df, .(region, type), mutate,
prop = number/sum(number))
and
transform(df, prop = number/ave(number, region, type, FUN = sum))
Which give
type region number prop
1 cattle EA 14 1
2 mixed EA 9 1
3 not stated EA 80 1
4 other EA 0 NaN
5 sheep EA 2 1
6 cattle NW 36 1
7 mixed NW 15 1
8 not stated NW 45 1
9 other NW 0 NaN
10 sheep NW 7 1
11 cattle S 12 1
12 mixed S 35 1
13 not stated S 92 1
14 other S 18 1
15 sheep S 1 1
Thanks for reading
Upvotes: 2
Views: 81
Reputation: 648
Actually, you need to apply ddply and group by "region" only.
Try this:
ddply(df, .(region), mutate, prop = number/sum(number))
type region number prop
1 cattle EA 14 0.133333333
2 mixed EA 9 0.085714286
3 not stated EA 80 0.761904762
4 other EA 0 0.000000000
5 sheep EA 2 0.019047619
6 cattle NW 36 0.349514563
7 mixed NW 15 0.145631068
8 not stated NW 45 0.436893204
9 other NW 0 0.000000000
10 sheep NW 7 0.067961165
11 cattle S 12 0.075949367
12 mixed S 35 0.221518987
13 not stated S 92 0.582278481
14 other S 18 0.113924051
15 sheep S 1 0.006329114
Reason: You want to have a summary on each group by the region so you need to ddply on region only
Upvotes: 1