Amir Rahbaran
Amir Rahbaran

Reputation: 2430

Group by variable and show the vaue in percentage rate with R and dplyr

imagine this is the structure of my Data hrd:

'data.frame':   14999 obs. of  2 variables:
 $ left                 : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 
 $ sales                : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...

I want to know the percentage of how many people have left (0 = stayed, 1 = left) for each each level of sales.

This is the closest I come:

hrd %>% group_by(sales) %>% count(left)

However, the output is this:

         sales   left     n
        <fctr> <fctr> <int>
1   accounting      0   563
2   accounting      1   204
3           hr      0   524
4           hr      1   215
5           IT      0   954
6           IT      1   273
7   management      0   539
8   management      1    91
9    marketing      0   655
10   marketing      1   203
11 product_mng      0   704
12 product_mng      1   198
13       RandD      0   666
14       RandD      1   121
15       sales      0  3126
16       sales      1  1014
17     support      0  1674
18     support      1   555
19   technical      0  2023
20   technical      1   697

I'm trying something like this:

 hrd %>% group_by(sales) 
     %>% summarise(count = n() ) 
     %>% mutate( leaving_rate = count(left == 1 )/ count )

But the error message is saying

Error: object 'left' not found

Upvotes: 0

Views: 70

Answers (1)

Nate
Nate

Reputation: 10671

Don't use summarise() first because it is truncating your data frame to a summary version. So dropping the column "left" (and any other not mentioned or non-grouping variables) and keeping only "sales" (grouping var) and "count" (mentioned var).

You can do it in one summarize call like this:

hrd %>% group_by(sales) %>%
    summarise(percent_left = sum(left) / n())

Upvotes: 1

Related Questions