Reputation: 142
The current percentages in the bar are calculate with the total amount of data. I want to each stack to have a fully 100%. (Solved)
Also the percentages should be rounded to the nearest integer. (Solved)
Edit: Remove all percentages below or equal to 1. (Solved)
Edit2: Make sure no labels are overlapping.
I've been googling for a while now. It seems like there isn't a proper way to prevent labels overlapping.
Possible solutions I discovered:
# Load libraries & packages =================================
library("ggplot2")
library("scales")
library("dplyr")
library("foreign")
library("tidyverse")
library("forcats")
# Data setup =================================
spss_file_path <- "D:\\Programming\\Testing\\2017-03-15_data_import&ggplot2\\Beispieldatensatz(fiktiv).sav"
exampledata <- read.spss(spss_file_path, use.value.labels = TRUE,
to.data.frame = TRUE, reencode = TRUE)
exampledata$V43 <- factor(exampledata$V43,
levels = c(1,2,3,4,5),
labels = c("1 Sehr zufrieden","2","3","4", "5 Sehr unzufrieden"))
exampledata$V43 <- factor(exampledata$V43, levels = rev(unique(levels(exampledata$V43))))
exampledata$A_REF <- factor(exampledata$A_REF, levels = rev(unique(levels(exampledata$A_REF))))
exampledata$V101 <- factor(exampledata$V101, levels = rev(unique(levels(exampledata$V101))))
labels <- exampledata %>%
filter(!is.na(V101), !is.na(V43)) %>%
count(A_REF) %>%
mutate(labels = paste(A_REF,"(n=", n, ")")) %>%
select(A_REF, labels)
plot_data <- exampledata %>%
filter(!is.na(V101), !is.na(V43)) %>%
left_join(labels, by = "A_REF")
plot_data <- plot_data %>%
group_by(labels) %>%
summarize(`5 Sehr unzufrieden` = sum(ifelse(V43 == "5 Sehr unzufrieden", 1, 0)) / n(),
`4` = sum(ifelse(V43 == "4", 1, 0)) / n(),
`3` = sum(ifelse(V43 == "3", 1, 0)) / n(),
`2` = sum(ifelse(V43 == "2", 1, 0)) / n(),
`1 Sehr zufrieden` = sum(ifelse(V43 == "1 Sehr zufrieden", 1, 0)) / n()) %>%
gather(key = Rating, value = prop, -labels)
plot_data$labels <- factor(plot_data$labels)
plot_data$Rating <- factor(plot_data$Rating) %>% fct_rev()
# Plot =================================
ggplot(plot_data, aes(x = labels, y = prop, fill = Rating)) +
geom_col() +
scale_y_continuous(labels = scales::percent, breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1)) +
labs(y=NULL, x=NULL, fill=NULL) +
ggtitle(paste(attr(exampledata, "variable.labels")[77])) +
theme_classic() +
geom_text(aes(label = if_else(prop > 0.02, scales::percent(round(prop, 2)), NULL)), position = position_fill(vjust=0.5)) +
coord_flip()
structure(list(exampledata.V101 = structure(c(2L, NA, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, NA, 2L, 2L, 2L, 1L, 2L, NA,
NA, NA, 1L, 1L, 2L, NA, 2L, 2L, 2L, NA, 2L, 2L, NA, NA, 1L, NA,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, NA, NA, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, NA, 1L, NA, 1L, NA,
1L, 2L, NA, NA, 2L, NA, 1L, 2L, 2L, NA, 2L, NA, 2L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 2L, 1L, NA, 2L, 2L, 2L, 2L, NA, 2L, 1L, 2L, 2L
), .Label = c("Weiblich", "Männlich"), class = "factor"), exampledata.A_REF = structure(c(18L,
18L, 18L, 18L, 18L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 16L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 16L, 18L, 18L, 16L, 18L,
16L, 18L, 18L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
16L, 18L, 18L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 17L, 16L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 17L, 18L, 18L,
16L, 18L, 16L, 18L, 18L, 16L, 16L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 16L, 18L,
16L, 16L, 18L, 18L, 18L, 17L, 16L, 18L), .Label = c("Zertifikat eines Aufbau- oder Ergänzungsstudiums",
"LA Berufliche Schulen", "LA Sonderschule", "LA Gymnasium", "LA Haupt- und Realschule",
"LA Grundschule", "Künstlerischer/musischer Abschluss", "Kirchlicher Abschluss",
"Staatsexamen (ohne Lehramt)", "Diplom Fachhochschule, Diplom I an Gesamthochschulen",
"Diplom Universität, Diplom II an Gesamthochschulen", "Sonstiges",
"Promotion", "Staatsexamen", "Magister", "Diplom", "Master",
"Bachelor"), class = "factor"), exampledata.V43 = structure(c(3L,
5L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 4L, 3L, 3L, 2L, NA, 4L, 5L, 5L,
4L, 4L, 4L, 4L, NA, 2L, 4L, 3L, 5L, 4L, 4L, 4L, NA, 4L, 4L, NA,
NA, 3L, 5L, 2L, 4L, 5L, 4L, 4L, 5L, 5L, 4L, NA, NA, 4L, NA, 3L,
4L, 5L, 5L, 2L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 5L, 4L, 5L, NA, 4L,
NA, 4L, NA, 4L, 5L, 4L, NA, 5L, NA, 4L, 4L, 4L, NA, 4L, NA, 5L,
4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 2L, 4L, 4L, 4L, 3L, 4L, NA, 4L,
5L, 5L, 4L), .Label = c("5 Sehr unzufrieden", "4", "3", "2",
"1 Sehr zufrieden"), class = "factor")), .Names = c("exampledata.V101",
"exampledata.A_REF", "exampledata.V43"), row.names = c(NA, 100L
), class = "data.frame")
Upvotes: 2
Views: 1192
Reputation: 8107
It's usually preferable to manipulate your data into summarized data before charting it. I find that trying to have ggplot2
do the summarization for you is either limited or difficult to have it shown the way you want.
library(tidyverse)
library(forcats)
Because it's best to summarize your data before plotting it in ggplot2
, the following bit of code calculates the proportion withing each group of label
that selected a particular answer on the scale. In the final step I turned the data from wide to long, so that all the proportions to be charted are in the same variable (which I call prop
).
plot_data <- plot_data %>% group_by(labels) %>%
summarize(`5 Sehr unzufrieden` = sum(ifelse(V43 == "5 Sehr unzufrieden", 1, 0)) / n(),
`4` = sum(ifelse(V43 == "4", 1, 0)) / n(),
`3` = sum(ifelse(V43 == "3", 1, 0)) / n(),
`2` = sum(ifelse(V43 == "2", 1, 0)) / n(),
`1 Sehr zufrieden` = sum(ifelse(V43 == "1 Sehr zufrieden", 1, 0)) / n()) %>%
gather(key = Rating, value = prop, -labels)
It's preferable that categorical variables are set as factors for manipulating, say, the order and the colours, so this is what the following does. Initially, my code had the scale labels (which I called Rating
in the gather
function above) go in the reverse order than what you had, so I'm using fct_rev
from the forcats
package to reverse it back.
plot_data$labels <- factor(plot_data$labels)
plot_data$Rating <- factor(plot_data$Rating) %>% fct_rev()
For the chart below, I just made a couple of changes. The most notable is that I'm using geom_col
instead of geom_bar
. In the background, geom_col
is the same as geom_bar(stat = "identity")
- it's just quicker to type. We're essentially telling ggplot2
to chart the data as is instead of treating it like raw data. However, I do need to specify the y
aesthetic to indicate what data I want charted, so I'm specifying to use the prop
variable in the initial ggplot
call.
# Plot =================================
ggplot(plot_data, aes(x = labels, y = prop, fill = Rating)) +
geom_col() +
scale_y_continuous(labels = scales::percent, breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1)) +
labs(y=NULL, x=NULL, fill=NULL) +
ggtitle(paste(attr(exampledata, "variable.labels")[77])) +
theme_classic() +
geom_text(aes(label = if_else(prop > 0.01, scales::percent(round(prop, 2)), NULL)), position = position_fill(vjust=0.5)) +
coord_flip()
The only other line I changed is the geom_text
call above. I added an if_else
function so that it either shows the label (if it's above 1%) or not (1% or less). Also, I rounded the percentage so that you don't have any decimals using the round
function. Remember that you need to round to 2 decimal points.
Upvotes: 2
Reputation: 6151
Not sure if this will get you towards where you want to go, but here's a simple version based on some code I made a little way back. Didn't include all the ggplot2 bits as I agree with @Phil that the summary should be done before plotting.
devtools::install_github("ekstroem/MESS")
x <- c(35, 34.6, 12, 5, .1, .99, 1.2, 11.11) # Input percentages
round_percent(x)
which gives
[1] 35 35 12 5 0 1 1 11
or you could have
round_percent(x[x>1])
which gives
[1] 36 35 12 5 1 11
You'd need to make sure the colouring matches the remaining groups tho' so there is still some work left.
Upvotes: 1