Reputation: 341
I am trying to create a plot of user login behavior for a two month period. I used the qplot
function from the ggplot2
package, and with the following code
qplot(date_time, login_count, data=client_login_clean)
I plotted login_count
over time as shown below. Unfortunately, the y-axis, num_records
is sorted such that the first five tick marks on the y-axis are 1, 10, 106, 11, and 12, rather than 1, 2, 3, 4, 5. Could someone let me know how to fix this?
Upvotes: 0
Views: 585
Reputation: 72779
You almost certainly have your variable encoded as a factor rather than an integer. Try login_count <- as.numeric( as.character( login_count ) )
then run it again. An alternative is taRifx::destring
.
As a side note, a wizard never mis-sorts his axis. He sorts it precisely as he means to.
Upvotes: 2
Reputation: 13913
That's because, for some reason, your login_count
variable is a character vector. ggplot internally coerces all character vectors to factors, with labels ordered alphabetically, and then sorts the axis according to that order.
I also think I know why this happened: "num_records"
is actually a value in your login_count
column, so the whole thing has been coerced to a character vector. Delete that element and use as.numeric
, then the ordering should be correct. This is a good opportunity to read over your data loading/generating process and make sure you haven't made any other mistakes. Sometimes the tiniest bugs can uncover massive problems you would never have noticed otherwise.
As a side note, this is why you should be careful with character
-class variables and ggplot plotting. You can save yourself a lot of headaches by explicitly specifying a factor ordering up front.
Upvotes: 3