Reputation: 49
I have a dataset in which there are following columns: flavor, flavorid and unitSoled.
Flavor Flavorid unitsoled
beans 350 6
creamy 460 2
.
.
.
I want to find top ten flavors and then calculate market share for each flavor. My logic is market share for each flavor = units soled for particular flavor divided by total units soled.
How do I implement this. For output I just want two col Flavorid and corresponding market share. Do I need to save top ten flavors in some table first?
Upvotes: 0
Views: 138
Reputation: 37879
One way is with the dplyr
package:
An example data set:
flavor <- rep(letters[1:15],each=5)
flavorid <- rep(1:15,each=5)
unitsold <- 1:75
df <- data.frame(flavor,flavorid,unitsold)
> df
flavor flavorid unitsold
1 a 1 1
2 a 1 2
3 a 1 3
4 a 1 4
5 a 1 5
6 b 2 6
7 b 2 7
8 b 2 8
9 b 2 9
...
...
Solution:
library(dplyr)
df %>%
select(flavorid,unitsold) %>% #select the columns you want
group_by(flavorid) %>% #group by flavorid
summarise(total=sum(unitsold)) %>% #sum the total units sold per id
mutate(marketshare=total/sum(total)) %>% #calculate the market share per id
arrange( desc(marketshare)) %>% #order by marketshare descending
head(10) #pick the 10 first
#and you can add another select(flavorid,marketshare) if you only want those two
Output:
Source: local data frame [10 x 3]
flavorid total marketshare
1 15 365 0.12807018
2 14 340 0.11929825
3 13 315 0.11052632
4 12 290 0.10175439
5 11 265 0.09298246
6 10 240 0.08421053
7 9 215 0.07543860
8 8 190 0.06666667
9 7 165 0.05789474
10 6 140 0.04912281
Upvotes: 4