Gerry
Gerry

Reputation: 117

Plot factors in order with grouping variable

I'm working with R. I have a dataframe that looks like this:

df <- (structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
       2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
       5L, 5L), .Label = c("2013", "2014", "2015", "2016", "2017"), 
       class = "factor"), user = structure(c(2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L,
       3L, 5L, 2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L, 3L, 5L),
       .Label = c("John", "Laura", "Liz", "Mark", "Martha"), class = "factor"), 
        spent = c(56, 64, 69, 38, 93, 70, 29, 94, 56, 76, 48, 17, 
        74, 67, 100, 29, 16, 23, 10, 51, 72, 35, 77, 83, 17)), 
        class = "data.frame", row.names = c(NA, -25L)))

I'm trying to generate a histogram with the "spent" variable on the y-axis, the "user" on the x-axis, and a facet for each year. For each year, the users should be ordered based on the "spent" variable.

I tried something like df$user2=factor(df$user, levels = df$user[order(df$year,df$spent)]) But I get an error saying that the 6th factor is duplicated.

Any help is greatly appreciated!

Gerry

Upvotes: 1

Views: 80

Answers (1)

eipi10
eipi10

Reputation: 93761

What you are describing is a bar plot. A histogram shows the distribution of a single continuous variable (for example hist(rnorm(100)).

Your ordering statement gave an error because each level in a factor variable (each unique value of user in this case) can appear only once in the levels argument. factor allows you to set a new ordering of the unique levels of user. For example, instead of alphabetic ordering, we can do levels=c("Liz","Laura","Mark","John","Martha")). Then df[order(df$user),] will sort the data frame by the new order of user and df[order(df$year, df$user),] will sort by year than user. However, we can't use factor to get a different order of user for each year.

Based on your description, it looks like you want a faceted plot, but with a different x-axis order in each facet. You can do this in ggplot if you create a new variable that sets the x-axis order (I've called this variable r below) and then use the labels argument in scale_x_continuous to get the desired axis labels.

library(tidyverse)

df = df %>% 
  # Convert year back to numeric
  mutate(year = as.numeric(as.character(year))) %>% 
  # Sort data into the order we want
  arrange(year, spent) %>% 
  # Create a new variable with the desired row order
  mutate(r = row_number())

ggplot(df, aes(r, spent)) +
  geom_col() + 
  facet_grid(. ~ year, scale="free_x") +
  scale_x_continuous(breaks=df$r, labels=df$user)

enter image description here

The above plot seems confusing due to the user order changing in each facet. Maybe something like this would work better:

ggplot(df, aes(year, spent, colour=user, group=user)) +
  geom_line() + 
  geom_point() +
  geom_text(data=df %>% filter(year==min(year)), aes(label=user), 
            hjust=1, position=position_nudge(x=-0.1), size=3) +
  expand_limits(y=0, x=2012.5) +
  theme_classic() +
  guides(colour=FALSE)

enter image description here

Upvotes: 1

Related Questions