KaC
KaC

Reputation: 287

Create a ggplot with grouped factor levels

This is variation on a question asked here: Group factor levels in ggplot.

I have a dataframe:

df <- data.frame(respondent = factor(c(1, 2, 3, 4, 5, 6, 7)),
                 location = factor(c("California", "Oregon", "Mexico",
                                     "Texas", "Canada", "Mexico", "Canada")))

There are three separate levels related to the US. I don't want to collapse them as the distinction between states is useful for data analysis. I would like to have, however, a basic barplot that combines the three US states and stacks them on top of one another, so that there are three bars in the barplot--Canada, Mexico, and US--with the US bar divided into three states like so: Plot

If the state factor levels had the "US" in their names, e.g. "US: California", I could use

library(tidyverse)
with_states <- df %>%
separate(location, into = c("Country", "State"), sep = ": ") %>%
  replace_na(list(State = "Other")) %>%
  mutate(State = as.factor(State)
         %>% fct_relevel("Other", after = Inf))

to achieve the desired outcome. But how can this be done when R doesn't know that the three states are in the US?

Upvotes: 0

Views: 186

Answers (1)

divibisan
divibisan

Reputation: 12155

If you look at the previous example, all the separate and replace_na functions do is separate the location variable into a country and state variable:

df

  respondent       location
1          1 US: California
2          2     US: Oregon
3          3         Mexico
...

df %>%
    separate(location, into = c("Country", "State"), sep = ": ") %>%
    replace_na(list(State = "Other"))

  respondent Country      State
1          1      US California
2          2      US     Oregon
3          3  Mexico      Other
...

So really all you need to do if get your data into this format: with a column for country and a column for state/provence.

There are many ways to do this yourself. Many times your data will already be in this format. If it isn't, the easiest way to fix it is to do a join to a table which maps location to country:

df
  respondent   location
1          1 California
2          2     Oregon
3          3     Mexico
4          4      Texas
5          5     Canada
6          6     Mexico
7          7     Canada

state_mapping <- data.frame(state = c("California", "Oregon", "Texas"),
                            country = c('US', 'US', 'US'),
                            stringsAsFactors = F)

df %>%
    left_join(state_mapping, by = c('location' = 'state')) %>%
    mutate(country = if_else(is.na(.$country),
                             location,
                             country))


  respondent   location country
1          1 California      US
2          2     Oregon      US
3          3     Mexico  Mexico
4          4      Texas      US
5          5     Canada  Canada
6          6     Mexico  Mexico
7          7     Canada  Canada

Once you've got it in this format, you can just do what the other question suggested.

Upvotes: 1

Related Questions