Reputation: 2430
imagine this is the structure of my Data hrd:
'data.frame': 14999 obs. of 2 variables:
$ left : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2
$ sales : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
I want to know the percentage of how many people have left
(0 = stayed, 1 = left) for each each level of sales
.
This is the closest I come:
hrd %>% group_by(sales) %>% count(left)
However, the output is this:
sales left n
<fctr> <fctr> <int>
1 accounting 0 563
2 accounting 1 204
3 hr 0 524
4 hr 1 215
5 IT 0 954
6 IT 1 273
7 management 0 539
8 management 1 91
9 marketing 0 655
10 marketing 1 203
11 product_mng 0 704
12 product_mng 1 198
13 RandD 0 666
14 RandD 1 121
15 sales 0 3126
16 sales 1 1014
17 support 0 1674
18 support 1 555
19 technical 0 2023
20 technical 1 697
I'm trying something like this:
hrd %>% group_by(sales)
%>% summarise(count = n() )
%>% mutate( leaving_rate = count(left == 1 )/ count )
But the error message is saying
Error: object 'left' not found
Upvotes: 0
Views: 70
Reputation: 10671
Don't use summarise()
first because it is truncating your data frame to a summary version. So dropping the column "left" (and any other not mentioned or non-grouping variables) and keeping only "sales" (grouping var) and "count" (mentioned var).
You can do it in one summarize call like this:
hrd %>% group_by(sales) %>%
summarise(percent_left = sum(left) / n())
Upvotes: 1