Alex S. Sandoval
Alex S. Sandoval

Reputation: 107

ggplot plotting multiple bars

I have counts of salary data broken up into different neighborhoods hood and further broken down into different income brackets along with their margin of error min and max for each income bracket. I want to plot the income brackets with their margin of error per neighborhood. Below is a subset of my data:

hood    PHE_Less than 20k   PHE_Less than 20k max   PHE_Less than 20k min  PHE_20k to 35k   PHE_20k to 35k max  PHE_20k to 35k min
   a                  291                  368.38                  213.62            250                 331.15             168.85
   b                  220                  283.86                  156.14            125                 185.47              64.53
   c                  226                  296.82                  155.18            306                 394.33             217.67
   d                  25                    41.82                    8.18             73                 107.94              38.06

And this is my R code:

PHE_20k.to.35k <- ggplot ( data = mydata2
     ,aes ( x = hood
           ,y = PHE_20k.to.35k
           ,fill= hood)) +

geom_bar (stat = "identity", group = 2) +

geom_errorbar( aes (ymin = PHE_20k.to.35k.max
                ,ymax = PHE_20k.to.35k.min)
                ,width = .2) +

ylab("20k to 35k") +
xlab("") +

guides (fill = F)

PHE_20k.to.35k

This only gets me one income bracket per hood. How can I add the other one?

I want to have the 2 income brackets PHE_Less than 20k and PHE_20k to 35k with their margin of errors per hood and have a legend show which income bracket is which. In reality I have 4 income brackets per neighborhood but this will help me finish that.

Any help will be appreciated!

Upvotes: 0

Views: 125

Answers (1)

camille
camille

Reputation: 16832

The first thing you want to do is get your data into a proper shape for ggplot2. The philosophy of ggplot is that data is in a long format where you can assign variables of data to different aesthetics, such as color or position, creating your visual elements dynamically. One hint that you probably need to reshape your data is that your columns have very similar names—that's a sign that they contain very similar data.

Think about what it is you want to plot and how you want to put together different elements. If I'm understanding the question correctly, the position along the x-axis depends on the neighborhood and the bracket. The position of the errorbars depend on the neighborhood and the bracket as well. And the endpoints of the errorbars depend on the min and max of the values.

I gathered the data into a long format, and used some regex functions to extract the bracket label and the measurement type (min, max, or neither) from the key column, which contained what were the column names before. Labels where this extract was blank are the measurements themselves, so I filled those in with replace_na, then spread it so there would be a min, max, and measurement for each combination of bracket and neighborhood.

library(tidyverse)

df_tidy <- df %>%
  gather(key = key, value = value, -hood) %>%
  mutate(bracket = str_extract(key, "(?<=PHE_)(\\w+\\s){2}\\w+")) %>%
  mutate(type = str_extract(key, "(min|max)")) %>%
  select(-key) %>%
  replace_na(list(type = "measure")) %>%
  spread(key = type, value = value)

df_tidy
#>   hood       bracket    max measure    min
#> 1    a    20k to 35k 331.15     250 168.85
#> 2    a Less than 20k 368.38     291 213.62
#> 3    b    20k to 35k 185.47     125  64.53
#> 4    b Less than 20k 283.86     220 156.14
#> 5    c    20k to 35k 394.33     306 217.67
#> 6    c Less than 20k 296.82     226 155.18
#> 7    d    20k to 35k 107.94      73  38.06
#> 8    d Less than 20k  41.82      25   8.18

From there, the data is ready to plot, with dodging to put both bars and errorbars side by side. One issue you'll notice is how to fill the bars and color the errorbars: it's hard to see the errorbars where they overlap. One option is to decrease the alpha of the bars.

ggplot(df_tidy, aes(x = hood, y = measure, fill = bracket)) +
  geom_col(position = position_dodge(width = 0.9), alpha = 0.5) +
  geom_errorbar(aes(ymin = min, ymax = max, color = bracket), position = position_dodge(width = 0.9), width = 0.4)

Another is to manually set fills and colors that are similar, but where the errorbars are darker.

ggplot(df_tidy, aes(x = hood, y = measure, fill = bracket)) +
  geom_col(position = position_dodge(width = 0.9)) +
  geom_errorbar(aes(ymin = min, ymax = max, color = bracket), position = position_dodge(width = 0.9), width = 0.4) +
  scale_fill_manual(values = c("skyblue", "tomato")) +
  scale_color_manual(values = c("skyblue4", "tomato4"))

I'll leave those aesthetic decisions to you.

Upvotes: 1

Related Questions