Reputation: 5681
I want to calculate percentages of categorical data.
I have the following dataset.
library(tidyverse)
tib <- tibble(a = c("yes", "yes", "yes", "yes"),
b = c("yes", "yes", "no", "yes"),
c = c("AB", "yes", "AC", "no"),
d = c("AC", "yes", "no", "AB"),
) space = c("UP", "DO", "UP", "TA")
I want to find the percentage of each a,b,c,d columns grouped by space.
So, for example if we want to see about "a":
df_perc <- as.data.frame(prop.table(table(tib$space, tib$a)) * 100)
which gives:
Var1 Var2 Freq
1 DO yes 25
2 TA yes 25
3 UP yes 50
which is right.
Now, in order not to do this for each column, I am trying to use gather:
df_tidy <- tib %>%
gather(key="let", value="response", -"space")
but I must somehow group by "space".
And do something like this:
df_perc <- as.data.frame(prop.table(table(df_tidy$let, df_tidy$response)) * 100)
Upvotes: 0
Views: 253
Reputation: 21757
How about this:
tib %>% pivot_longer(-space, names_to = "vars", values_to="vals") %>%
group_by(space, vars, vals) %>% count() %>%
ungroup %>%
group_by(vars) %>%
mutate(pct = (n/sum(n))*100) %>%
select(-n) %>%
pivot_wider(names_from="vars", values_from="pct", values_fill=0)
# # A tibble: 8 x 6
# space vals a b c d
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 DO yes 25 25 25 25
# 2 TA yes 25 25 0 0
# 3 TA no 0 0 25 0
# 4 TA AB 0 0 0 25
# 5 UP yes 50 25 0 0
# 6 UP no 0 25 0 25
# 7 UP AB 0 0 25 0
# 8 UP AC 0 0 25 25
Upvotes: 2