Reputation: 21
I have a dataset, I'm including (a small subset) of the relevant columns below,
year ID type result
2003 1 new closed
2003 2 new transferred
2003 3 subsequent closed
2003 4 subsequent diverted
....
2015 1000 new closed
What I want to calculate is the fraction of subsequents, (no. of subsequents/(no.subsequents +no. of news) grouped by year and result, like so:
year result subsequent_frac
2003 closed 0.10
2003 transferred 0.05
2003 ....
....
2015 closed 0.05
2015 transferred 0.1
I know I can do in in steps, with a group_by and summaries to get the counts and and do each result separately.... I was wondering if there was a neater/faster way to do this.
Upvotes: 2
Views: 142
Reputation: 3250
Is this what you are looking for? Applying summarise removes one level of grouping, therefore the second group_by.
dfSummarized <- group_by(df, year, type) %>%
summarise(subsequent_frac = n()) %>%
#group_by(type) %>% # maybe you don't need this?
mutate(freq = subsequent_frac / sum(subsequent_frac))
Upvotes: 1