Paul
Paul

Reputation: 387

X axis with ggplot2 not sequential

I have a small data set that I reproduced below. It has customers in rows and quantities per month in columns. I was using ggplot2 to plot it two weeks ago and it worked fine. But now, the time periods (x axis) are not sorting correctly. Period "P_10" is after "P_1" where it should be "P_2".

The data created in the first few lines is the same format as my real-world data, so I don't want to create it differently.

My first question is: why did this work two weeks ago and now is does not? There were several packages updated in the last week, I guess something changed.

Secondly, (and more importantly) how do I fix this?

library(dplyr)
library(tidyr)
library(ggplot2)

# create data
a = paste("p_",1:20,sep = "")
b = paste("c",1:6,sep = "")
mydata2 = data.frame(matrix(rnorm(20),6,20))
names(mydata2) = a
mydata2$cust = b
mydata2 = mydata2[,c(ncol(mydata2),1:(ncol(mydata2)-1))]

# plot data
p_data = mydata2 %>% gather(period,Qty,-cust)

pl=(ggplot(data=p_data,aes(x=period,y=Qty,group=cust,colour=cust)) +
    geom_line(size=.4))

# display plot
pl

Upvotes: 1

Views: 826

Answers (2)

Ted Mosby
Ted Mosby

Reputation: 1456

You could also use factors and sort the levels of the factors. Not saying this is any better the the other answer, just another way!

Upvotes: 0

fdetsch
fdetsch

Reputation: 5308

As for your first question, the answer becomes evident when running sort. The single entries in the second column of your data are sorted in ascending order, and hence 'p_10', 'p_11', etc. occur before 'p_2', 'p_3', etc.

unique(sort(p_data[, 2]))
 [1] "p_1"  "p_10" "p_11" "p_12" "p_13" "p_14" "p_15" "p_16" "p_17" "p_18" "p_19" "p_2"  "p_20" "p_3"  "p_4"  "p_5"  "p_6" 
[18] "p_7"  "p_8"  "p_9" 

As for your second question, I would recommend to simply convert the second column of your data to 'factor'. According to my experience, ggplot is much easier to handle when using 'factor' instead of 'character' variables due to, among others, such sorting issues. Remember to manually define the desired factor labels. Otherwise, you will end up with 'p_1', 'p_10', 'p_11', etc. on the x-axis again.

p_data[, 2] <- factor(p_data[, 2], levels = unique(p_data[, 2]))

ggplot(data = p_data, aes(x = period, y = Qty, group = cust, colour = cust)) +
      geom_line(size = .4)

enter image description here

Upvotes: 2

Related Questions