Ian.T
Ian.T

Reputation: 1166

ggplot geom _labels how to group by group AND x axis values

I have this code to plot an histogram with y = count, x = a factor and I added labels with the group percentage

ggplot(aes(IntervalDays, fill = group)) + 
  geom_histogram(stat="count") +
  geom_label(stat = "count", aes(label = round(..prop..*100, digits = 1), 
group = c(group)), 
position = position_stack(vjust = 0.5))

Plot this enter image description here

In this case I have each group adding 100% between the two bars (for group A: 51.5 + 48.5 = 100). Can I change that, to see the percentages of each color in each bar (for example I want to know for the group [5-10] what is the percentage of red/green/blue and the sum of green/red/blue percentage would be 100%

this is what the data looks like

group   IntervalDays
A     [0,5]
C     (5,10]
A     (5,10]
A     [0,5]
C     (5,10]
A     [0,5]
B     (5,10]
A     (5,10]
C     (5,10]
B     (5,10]
A     [0,5]
A     [0,5]
C     [0,5]
.
.
.

Thanks a lot

I.

Upvotes: 0

Views: 1116

Answers (1)

kmacierzanka
kmacierzanka

Reputation: 825

This can be achieved by manipulating your data slightly with dplyr and then changing stat to "identity".

I am using this data from the sample you gave:

df <- structure(list(group = c("A", "C", "A", "A", "C", "A", "B", "A", 
"C", "B", "A", "A", "C"), IntervalDays = c("[0,5]", "(5,10]", 
"(5,10]", "[0,5]", "(5,10]", "[0,5]", "(5,10]", "(5,10]", "(5,10]", 
"(5,10]", "[0,5]", "[0,5]", "[0,5]")), row.names = c(NA, -13L
), class = "data.frame")

Your plotting code when applied to df gives the below plot (the only thing I have changed so far from your original plotting code is geom_histogram to geom_bar as this makes more sense with your type of data):

library(ggplot2)

# original plot code, changed to geom_bar
ggplot(df, aes(x = IntervalDays, fill = group)) + 
        geom_bar(stat = "count") +
        geom_label(stat = "count", aes(label = round(..prop..*100, digits = 1), 
                                       group = c(group)), 
                   position = position_stack(vjust = 0.5))

We don't want this as it calculates proportions for the group, not for the column. To get the column proportions I have used dplyr as follows:

library(dplyr)

df_new <- df %>% group_by(group, IntervalDays) %>%
        summarise(sum = n()) %>% group_by(IntervalDays) %>%
        mutate(col_prop = sum/sum(sum))
> df_new
# A tibble: 5 x 4
# Groups:   IntervalDays [2]
  group IntervalDays   sum col_prop
  <chr> <chr>        <int>    <dbl>
1 A     (5,10]           2    0.286
2 A     [0,5]            5    0.833
3 B     (5,10]           2    0.286
4 C     (5,10]           3    0.429
5 C     [0,5]            1    0.167

I have then plotted new_df using as much of your original code as possible. The main difference is I have changed stat to "identity" from "count" so that the values in sum are plotted explicitly. Since we have calculated the col_prop ourselves, this is what I assign to the label argument:

ggplot(df_new, aes(x = IntervalDays, y = sum, fill = group)) +
        geom_bar(stat = "identity") +
        geom_label(stat = "identity", aes(label = round(col_prop*100, digits = 1),
                                          group = group),
                   position = position_stack(vjust = 0.5))


You can have a look at the essence of what ggplot is doing behind the scenes when calculating your original proportions. It is something like this, without the second group_by that we saw above:

df %>% group_by(group, IntervalDays) %>%
        summarise(sum = n()) %>%
        mutate(col_prop = sum/sum(sum))
# A tibble: 5 x 4
# Groups:   group [3]
  group IntervalDays   sum col_prop
  <chr> <chr>        <int>    <dbl>
1 A     (5,10]           2    0.286
2 A     [0,5]            5    0.714
3 B     (5,10]           2    1    
4 C     (5,10]           3    0.75 
5 C     [0,5]            1    0.25

Upvotes: 2

Related Questions