Reputation: 73
I have successfully made a stacked barplot in R where the percentages add up to 100% for several different categories. The dataframe looks like this:
sujeito epentese vozeamento teste posicao palavra tipo ortografia cseguinte
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 a 1 1 P L alpes ps ces d_v
2 a 0 1 P L crepes ps ces d_v
3 a 0 0 P L chopes ps ces d_v
4 a 1 0 P L jipes ps ces d_d
5 a 1 0 P L naipes ps ces d_d
6 a 0 0 P L xaropes ps ces d_d
7 a 0 0 P L artes ts ces d_v
8 a 0 0 P L botes ts ces d_v
9 a 0 0 P L dentes ts ces d_v
10 a 0 0 P L potes ts ces d_d
# ... with 421 more rows
Then I used ggplot and deplyr to make a stacked barplot displaying these percentages. I used this code:
dadospb%>%
group_by(tipo, epentese)%>%
summarise(quantidade = n())%>%
mutate(frequencia = quantidade/sum(quantidade))%>%
ggplot(., aes(x = tipo, y = frequencia, fill = epentese))+
geom_col(position = position_fill(reverse=FALSE))+
geom_text(aes(label = if_else(epentese == 1, scales::percent(frequencia, accuracy = 1), "")), vjust = 0, nudge_y = .01) +
scale_y_continuous(labels=scales::percent)+
labs(title = "Epenthesis rates by cluster type on L1 Portuguese")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Cluster Type")+ylab("Frequency")
My intention, though, is to make it as the graph of the right side of this picture, with columns organized in a descending order:
I have tried different packages and also manipulating group_by, but still no luck. I hope this isn't too redundant. The tutorials I've come across on the web which involve manipulating Tidyverse, to which I have elementary knowledge. Thanks in advance!
Upvotes: 1
Views: 1638
Reputation: 66415
I like using the forcats
package for ordering categories before they get into ggplot. In this case, we could use fct_inorder
after sorting the data in order of epentese (so 0 appears first) and then frecuencia. Then it becomes an ordered factor and will plot in ggplot with that order. (See how cluster 4 comes before cluster 3 in my made-up data.)
I used mtcars but renamed to have your data's names:
library(dplyr); library(forcats)
# Prep to make mtcars look like your data
mtcars %>%
mutate(vs = as.character(vs)) %>%
group_by(tipo = carb, epentese = vs) %>%
summarise(quantidade = sum(wt))%>%
mutate(frequencia = quantidade/sum(quantidade)) %>%
ungroup() %>%
# Arrange in the way you want and then make tipo an ordered factor
# I want epentese = 1 first, then descending frecuencia
# When ggplot receives an ordered factor, it will display in order
arrange(desc(epentese), -frequencia) %>%
mutate(tipo = tipo %>% as_factor %>% fct_inorder) %>%
...
[Your ggplot code]
Upvotes: 5
Reputation: 23737
To help you translate the linked question and answer to your problem at hand -
``` r
library(tidyverse)
# devtools::install_github("alistaire47/read.so")
dadospb <- read.so::read_so("sujeito epentese vozeamento teste posicao palavra tipo ortografia cseguinte
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 a 1 1 P L alpes ps ces d_v
2 a 0 1 P L crepes ps ces d_v
3 a 0 0 P L chopes ps ces d_v
4 a 1 0 P L jipes ps ces d_d
5 a 1 0 P L naipes ps ces d_d
6 a 0 0 P L xaropes ps ces d_d
7 a 0 0 P L artes ts ces d_v
8 a 0 0 P L botes ts ces d_v
9 a 1 0 P L dentes ts ces d_v
10 a 0 0 P L potes ts ces d_d ")
df1 <-
dadospb%>%
group_by(tipo, epentese)%>%
summarise(quantidade = n())%>%
mutate(frequencia = quantidade/sum(quantidade))
#> `summarise()` has grouped output by 'tipo'. You can override using the `.groups` argument.
fac_order <- df1 %>%
filter(epentese ==1 ) %>%
arrange(frequencia) %>%
pull(tipo)
df1 %>%
mutate(novotipo = factor(tipo, levels = fac_order)) %>%
ggplot(aes(x = novotipo, y = frequencia, fill = epentese)) +
geom_col()
Created on 2021-02-13 by the reprex package (v1.0.0)
Upvotes: 0