Reputation: 117
I'm working with R. I have a dataframe that looks like this:
df <- (structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L), .Label = c("2013", "2014", "2015", "2016", "2017"),
class = "factor"), user = structure(c(2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L,
3L, 5L, 2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L, 3L, 5L),
.Label = c("John", "Laura", "Liz", "Mark", "Martha"), class = "factor"),
spent = c(56, 64, 69, 38, 93, 70, 29, 94, 56, 76, 48, 17,
74, 67, 100, 29, 16, 23, 10, 51, 72, 35, 77, 83, 17)),
class = "data.frame", row.names = c(NA, -25L)))
I'm trying to generate a histogram with the "spent" variable on the y-axis, the "user" on the x-axis, and a facet for each year. For each year, the users should be ordered based on the "spent" variable.
I tried something like df$user2=factor(df$user, levels = df$user[order(df$year,df$spent)])
But I get an error saying that the 6th factor is duplicated.
Any help is greatly appreciated!
Gerry
Upvotes: 1
Views: 80
Reputation: 93761
What you are describing is a bar plot. A histogram shows the distribution of a single continuous variable (for example hist(rnorm(100))
.
Your ordering statement gave an error because each level in a factor variable (each unique value of user
in this case) can appear only once in the levels
argument. factor
allows you to set a new ordering of the unique levels of user
. For example, instead of alphabetic ordering, we can do levels=c("Liz","Laura","Mark","John","Martha")
). Then df[order(df$user),]
will sort the data frame by the new order of user
and df[order(df$year, df$user),]
will sort by year
than user
. However, we can't use factor
to get a different order of user
for each year
.
Based on your description, it looks like you want a faceted plot, but with a different x-axis order in each facet. You can do this in ggplot if you create a new variable that sets the x-axis order (I've called this variable r
below) and then use the labels
argument in scale_x_continuous
to get the desired axis labels.
library(tidyverse)
df = df %>%
# Convert year back to numeric
mutate(year = as.numeric(as.character(year))) %>%
# Sort data into the order we want
arrange(year, spent) %>%
# Create a new variable with the desired row order
mutate(r = row_number())
ggplot(df, aes(r, spent)) +
geom_col() +
facet_grid(. ~ year, scale="free_x") +
scale_x_continuous(breaks=df$r, labels=df$user)
The above plot seems confusing due to the user order changing in each facet. Maybe something like this would work better:
ggplot(df, aes(year, spent, colour=user, group=user)) +
geom_line() +
geom_point() +
geom_text(data=df %>% filter(year==min(year)), aes(label=user),
hjust=1, position=position_nudge(x=-0.1), size=3) +
expand_limits(y=0, x=2012.5) +
theme_classic() +
guides(colour=FALSE)
Upvotes: 1