Reputation: 576
I am trying to plot a graph for a data frame that looks like this:
year week cases
2003 1 0
2003 2 0
2003 3 12
2003 4 23
2003 5 12
2003 6 16
2003 7 20
2003 8 13
2003 9 0
2003 10 0
2003 11 21
2003 12 133
2003 13 9
2003 14 22
Carrying data for 52 weeks running from 2003-2012.
Here's what running dput(head(df,20)
gives me:
structure(list(year = c(2003L, 2003L, 2003L, 2003L, 2003L, 2003L,
2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L,
2003L, 2003L, 2003L, 2003L, 2003L), week = 1:20, cases = c(2,
2, 26, 146, 26, 70, 115, 37, 2, 2, 124, 41, 245, 135, 146, 163,
26, 26, 92, 92)), .Names = c("year", "week", "cases"), row.names 1925:1944, class = "data.frame")
I want my Y-axis to be simply the range of the variable 'cases', and the X-axis to run from week 1 through 52. I want to plot every year's data points in a different color.
Here's my ggplot2 code:
ggplot(df, aes(x=week, y=cases, col=year)) + geom_point()
This is the graph it's generating:
Why is this happening? I see no reason why my Y-axis shouldn't just be the range of 'cases' in ascending order.
Upvotes: 1
Views: 7051
Reputation: 24074
To sum up what was said in the comments :
Your y-axis is indeed sorted but according to the character values (or rather the factor levels, as your variable was imported as factor) and not the numeric ones (so 1, 10, 11, ..., 2, 20, ...
)
There is 2 problems that need to be solved:
the first one is that you have to understand why the variable wasn't imported as numeric. You probably have a "strange" value (like 1,2
for example, ie a comma instead of a point as decimal separator)
The second one is you need numeric values to plot your data correctly. For that, you can transform your factor with df$cases <- as.numeric(as.character(df$cases))
. Note that the strange value(s) will be converted to NAs
, you may not want that.
Just a final note, if you don't want your character variables to be imported as factors, you can use the parameter stringsAsFactors=FALSE
in the import step.
Upvotes: 4