Thiago Fernandes
Thiago Fernandes

Reputation: 273

How to calculate proportion by groups with dplyr?

My dataset has two Groups A and B, totaling 160 rows.

I would like to know how to propose items within each group that you have:

My Dataset

Dados = structure(list(Espessura = c(5.7, 4.3, 5.7, 5.3, 3.1, 3, 3.6, 
5.9, 4.4, 3.1, 5.8, 3.7, 5.9, 5.3, 6.7, 6, 4.2, 4.1, 2.8, 4.3, 
4.6, 4.7, 3.1, 5, 2.6, 5.2, 6.2, 5.4, 5.7, 3.4, 5.4, 6.9, 5.8, 
4, 5.8, 5.4, 4.7, 5.9, 3.6, 3.5, 5.9, 5.4, 6.5, 4.2, 4.4, 2.4, 
5.3, 6.2, 4.5, 5.9, 4.1, 6.7, 5.8, 5.9, 2.9, 6.8, 5.7, 3.5, 3.5, 
6.1, 5.5, 5.6, 4, 3.9, 3.8, 2.8, 5.5, 3.5, 5.5, 4.1, 2.9, 5.7, 
5.7, 2.7, 3.7, 5.6, 3.8, 5.9, 3, 4.9, 4.9, 6.5, 3.9, 2.3, 4.5, 
6.4, 5.8, 5.7, 5.1, 2.9, 6, 5.8, 5.1, 4.5, 4.5, 4, 5.4, 7, 3.3, 
6, 3.1, 6.3, 4.3, 5.3, 4.9, 5.6, 6, 2.8, 5.6, 3.5, 4, 6.5, 4.6, 
6.2, 6.4, 4, 2.4, 5.7, 6.3, 5.3, 3.7, 6.1, 5.7, 5.7, 3.7, 5.6, 
6.1, 3, 3.8, 5.7, 6.6, 5.8, 3.3, 2.7, 5.7, 6.4, 5.8, 3.5, 5.4, 
4.2, 6.1, 5.3, 5.4, 3.1, 5.1, 3.9, 6.4, 3.4, 6.7, 2.4, 5.1, 5.7, 
3.1, 6.2, 6.3, 4.9, 6.5, 4.5, 6.1, 5.7), Turma = structure(c(2L, 
1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 
1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor")), class = "data.frame", row.names = c(NA, 
-160L))

My Script

library("dplyr")
    Dados %>%
      group_by(Turma) %>%
      summarise(n = n())  %>%
      mutate(menor = dim(filter(Dados, Espessura < 3.5))/ n*100) %>%
      mutate(maior = dim(filter(Dados, Espessura > 6.5)) /n*100) %>%
      mutate(fora = dim(filter(Dados, Espessura < 3.5 |  Espessura > 6.5))/n*100)

Wrong Result

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 5
  Turma     n menor maior  fora
  <fct> <int> <dbl> <dbl> <dbl>
1 A        80  32.5  8.75  41.2
2 B        80   2.5  2.5    2.5
> 

Correct Result

enter image description here

Upvotes: 0

Views: 279

Answers (2)

Vinson Ciawandy
Vinson Ciawandy

Reputation: 1166

library(dplyr)
library(magrittr)
Dados %>%
  group_by(Turma) %>%
  summarise(n = n(),
            menor = mean( Espessura < 3.5)*100,
            maior = mean( Espessura > 6.5)*100,
            fora = mean( Espessura < 3.5 |  Espessura > 6.5)*100)

Will gives

# A tibble: 2 x 5
  Turma     n menor maior  fora
  <fct> <int> <dbl> <dbl> <dbl>
1 A        80  32.5  0    32.5 
2 B        80   0    8.75  8.75

Upvotes: 2

Peace Wang
Peace Wang

Reputation: 2419

Firstly, I use data.table way to get the correct result.

library(data.table)
dt <- setDT(Dados)
dt[,.(n = .N,
      menor = nrow(.SD[Espessura < 3.5])/.N*100,
      maior = nrow(.SD[Espessura > 6.5])/.N*100,
      fora = nrow(.SD[Espessura < 3.5 | Espessura > 6.5])/.N*100),
   by = Turma]

Then I find that your first wrong step is dim(filter(Dados, Espessura < 3.5)). Because it's result is always 80 2, not your desired 80 0.

Upvotes: 1

Related Questions