Reputation: 387
I have a small data set that I reproduced below. It has customers in rows and quantities per month in columns. I was using ggplot2 to plot it two weeks ago and it worked fine. But now, the time periods (x axis) are not sorting correctly. Period "P_10" is after "P_1" where it should be "P_2".
The data created in the first few lines is the same format as my real-world data, so I don't want to create it differently.
My first question is: why did this work two weeks ago and now is does not? There were several packages updated in the last week, I guess something changed.
Secondly, (and more importantly) how do I fix this?
library(dplyr)
library(tidyr)
library(ggplot2)
# create data
a = paste("p_",1:20,sep = "")
b = paste("c",1:6,sep = "")
mydata2 = data.frame(matrix(rnorm(20),6,20))
names(mydata2) = a
mydata2$cust = b
mydata2 = mydata2[,c(ncol(mydata2),1:(ncol(mydata2)-1))]
# plot data
p_data = mydata2 %>% gather(period,Qty,-cust)
pl=(ggplot(data=p_data,aes(x=period,y=Qty,group=cust,colour=cust)) +
geom_line(size=.4))
# display plot
pl
Upvotes: 1
Views: 826
Reputation: 1456
You could also use factors and sort the levels of the factors. Not saying this is any better the the other answer, just another way!
Upvotes: 0
Reputation: 5308
As for your first question, the answer becomes evident when running sort
. The single entries in the second column of your data are sorted in ascending order, and hence 'p_10', 'p_11', etc. occur before 'p_2', 'p_3', etc.
unique(sort(p_data[, 2]))
[1] "p_1" "p_10" "p_11" "p_12" "p_13" "p_14" "p_15" "p_16" "p_17" "p_18" "p_19" "p_2" "p_20" "p_3" "p_4" "p_5" "p_6"
[18] "p_7" "p_8" "p_9"
As for your second question, I would recommend to simply convert the second column of your data to 'factor'. According to my experience, ggplot
is much easier to handle when using 'factor' instead of 'character' variables due to, among others, such sorting issues. Remember to manually define the desired factor labels. Otherwise, you will end up with 'p_1', 'p_10', 'p_11', etc. on the x-axis again.
p_data[, 2] <- factor(p_data[, 2], levels = unique(p_data[, 2]))
ggplot(data = p_data, aes(x = period, y = Qty, group = cust, colour = cust)) +
geom_line(size = .4)
Upvotes: 2