George
George

Reputation: 5681

find percentages in categorical data

I want to calculate percentages of categorical data.

I have the following dataset.

library(tidyverse)

tib <- tibble(a = c("yes", "yes", "yes", "yes"),
              b = c("yes", "yes", "no", "yes"),
              c = c("AB", "yes", "AC", "no"),
              d = c("AC", "yes", "no", "AB"),
)             space = c("UP", "DO", "UP", "TA")

I want to find the percentage of each a,b,c,d columns grouped by space.

So, for example if we want to see about "a":

df_perc <- as.data.frame(prop.table(table(tib$space, tib$a)) * 100)

which gives:

  Var1 Var2 Freq
1   DO  yes   25
2   TA  yes   25
3   UP  yes   50

which is right.

Now, in order not to do this for each column, I am trying to use gather:

df_tidy <- tib %>%
    gather(key="let", value="response", -"space")
   

but I must somehow group by "space".

And do something like this:

df_perc <- as.data.frame(prop.table(table(df_tidy$let, df_tidy$response)) * 100)

Upvotes: 0

Views: 253

Answers (1)

DaveArmstrong
DaveArmstrong

Reputation: 21757

How about this:

tib %>% pivot_longer(-space, names_to = "vars", values_to="vals") %>% 
  group_by(space, vars, vals) %>% count() %>%
  ungroup %>% 
  group_by(vars) %>% 
  mutate(pct = (n/sum(n))*100) %>% 
  select(-n) %>% 
  pivot_wider(names_from="vars", values_from="pct", values_fill=0) 
# # A tibble: 8 x 6
#   space vals      a     b     c     d
#   <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 DO    yes      25    25    25    25
# 2 TA    yes      25    25     0     0
# 3 TA    no        0     0    25     0
# 4 TA    AB        0     0     0    25
# 5 UP    yes      50    25     0     0
# 6 UP    no        0    25     0    25
# 7 UP    AB        0     0    25     0
# 8 UP    AC        0     0    25    25

Upvotes: 2

Related Questions