Reputation: 1166
I have this code to plot an histogram with y = count, x = a factor and I added labels with the group percentage
ggplot(aes(IntervalDays, fill = group)) +
geom_histogram(stat="count") +
geom_label(stat = "count", aes(label = round(..prop..*100, digits = 1),
group = c(group)),
position = position_stack(vjust = 0.5))
In this case I have each group adding 100% between the two bars (for group A: 51.5 + 48.5 = 100). Can I change that, to see the percentages of each color in each bar (for example I want to know for the group [5-10] what is the percentage of red/green/blue and the sum of green/red/blue percentage would be 100%
this is what the data looks like
group IntervalDays
A [0,5]
C (5,10]
A (5,10]
A [0,5]
C (5,10]
A [0,5]
B (5,10]
A (5,10]
C (5,10]
B (5,10]
A [0,5]
A [0,5]
C [0,5]
.
.
.
Thanks a lot
I.
Upvotes: 0
Views: 1116
Reputation: 825
This can be achieved by manipulating your data slightly with dplyr
and then changing stat
to "identity"
.
I am using this data from the sample you gave:
df <- structure(list(group = c("A", "C", "A", "A", "C", "A", "B", "A",
"C", "B", "A", "A", "C"), IntervalDays = c("[0,5]", "(5,10]",
"(5,10]", "[0,5]", "(5,10]", "[0,5]", "(5,10]", "(5,10]", "(5,10]",
"(5,10]", "[0,5]", "[0,5]", "[0,5]")), row.names = c(NA, -13L
), class = "data.frame")
Your plotting code when applied to df
gives the below plot (the only thing I have changed so far from your original plotting code is geom_histogram
to geom_bar
as this makes more sense with your type of data):
library(ggplot2)
# original plot code, changed to geom_bar
ggplot(df, aes(x = IntervalDays, fill = group)) +
geom_bar(stat = "count") +
geom_label(stat = "count", aes(label = round(..prop..*100, digits = 1),
group = c(group)),
position = position_stack(vjust = 0.5))
We don't want this as it calculates proportions for the group, not for the column. To get the column proportions I have used dplyr
as follows:
library(dplyr)
df_new <- df %>% group_by(group, IntervalDays) %>%
summarise(sum = n()) %>% group_by(IntervalDays) %>%
mutate(col_prop = sum/sum(sum))
> df_new
# A tibble: 5 x 4
# Groups: IntervalDays [2]
group IntervalDays sum col_prop
<chr> <chr> <int> <dbl>
1 A (5,10] 2 0.286
2 A [0,5] 5 0.833
3 B (5,10] 2 0.286
4 C (5,10] 3 0.429
5 C [0,5] 1 0.167
I have then plotted new_df
using as much of your original code as possible. The main difference is I have changed stat
to "identity"
from "count"
so that the values in sum
are plotted explicitly. Since we have calculated the col_prop
ourselves, this is what I assign to the label
argument:
ggplot(df_new, aes(x = IntervalDays, y = sum, fill = group)) +
geom_bar(stat = "identity") +
geom_label(stat = "identity", aes(label = round(col_prop*100, digits = 1),
group = group),
position = position_stack(vjust = 0.5))
You can have a look at the essence of what ggplot
is doing behind the scenes when calculating your original proportions. It is something like this, without the second group_by
that we saw above:
df %>% group_by(group, IntervalDays) %>%
summarise(sum = n()) %>%
mutate(col_prop = sum/sum(sum))
# A tibble: 5 x 4
# Groups: group [3]
group IntervalDays sum col_prop
<chr> <chr> <int> <dbl>
1 A (5,10] 2 0.286
2 A [0,5] 5 0.714
3 B (5,10] 2 1
4 C (5,10] 3 0.75
5 C [0,5] 1 0.25
Upvotes: 2