Reputation: 3393
I have a df:
Year Ratio N Mean sd se ci
97 1867 TILLBANK...PLACTILL 2 3.861999 4.082170 2.886530 36.67685
98 1867 TILLOBL..PLACTILL 2 21.848833 17.859532 12.628596 160.46153
99 1867 TILLLOAN.PLACTILL 2 54.197044 23.309360 16.482207 209.42629
100 1867 TILLEQUI.PLACTILL 2 0.000000 0.000000 0.000000 0.00000
101 1867 TILLCONT.PLACTILL 2 0.000000 0.000000 0.000000 0.00000
102 1867 TILLRECI.PLACTILL 2 10.772286 5.110514 3.613679 45.91615
str(df) :
'data.frame': 1152 obs. of 7 variables:
$ Year : Factor w/ 156 levels "1855","1856",..: 13 13 13 13 13 13 13 13 14 14 ...
$ Ratio: Factor w/ 8 levels "TILLBANK...PLACTILL",..: 1 2 3 4 5 6 7 8 1 2 ...
$ N : num 2 2 2 2 2 2 2 2 2 2 ...
$ Mean : num 3.86 21.85 54.2 0 0 ...
$ sd : num 4.08 17.86 23.31 0 0 ...
$ se : num 2.89 12.63 16.48 0 0 ...
$ ci : num 36.7 160.5 209.4 0 0 ...
1) I am doing a ggplot
:
qqs<-ggplot(dfccomp, aes(x=Year, y=sd,colour=Ratio))+geom_point()+
facet_grid(Ratio~.)+
theme(axis.text.x = element_text(angle=-90, hjust=0.5, size=11,colour="black"))
This plot works with geom_point()
but now with geom_line()
. If I use geom_point()
I then get very messy x-axis with all the years (from 1867 to 2010):
And if I use geom_line()
,which does not work, I get:
So, I wonder how it is possible to only pick some certain years to be present on the x axis?
2) The other strange thing that I dont understand is if I convert the df$Year
above to numeric,
df$Year=as.numeric(as.character(df$Year))
Plot is then:
Now, only 3 years are present on the x-axis. Which is better but still not what I want...
why does both geom_point()
and geom_line()
works?
Updated: On the answer below I read that "Year is a factor and as such ggplot() will interpret that accordingly and produce a dotplot. The reason geom_line() doesn't do anything as this geom doesn't make sense for the data supplied; the factor nature indicates to ggplot() that the x-axis is not continuous and there is nothing to draw between points on that axis, hence no lines.".
But I have a different plot where geom_line()
works with a factor. Why is it so?
qq<-ggplot(df, aes(x=Year, y=Mean,colour=Ratio)) +
geom_errorbar(aes(ymin=Mean-sd, ymax=Mean+sd), colour="black", width=.1, position=position_dodge(.1)) +
geom_line(position=position_dodge(.1)) +
geom_point(position=position_dodge(.1), size=3, shape=21, fill="white") + # 21 is filled circle
xlab("Year") +
ylab("Mean (%)")+ggtitle("Ratios")+facet_grid(Ratio~.)+theme(axis.text.x = element_text(angle=-90, hjust=0.5, size=11,colour="black"))
The picture:
Upvotes: 2
Views: 26946
Reputation: 81683
If you use Year
as factor, ggplot
will print a label for every factor level. You can see this in your first two plots.
If you use Year
as numeric variable, ggplot
will automatically select a subset of the values for the labels of the x-axis. In your third plot, the distance between two breaks is 100.
You can manually specify where to do you want the break points on the x-axis with scale_x_continuous
and the argument breaks
. In the example below, a the distance between the breaks is 20. Play around with the code to find the desired plot.
ggplot(df, aes(x=as.numeric(as.character(Year)), y=sd, colour=Ratio)) +
geom_point() +
facet_grid(Ratio~.) +
theme(axis.text.x = element_text(angle=-90, hjust=0.5, size=11,colour="black")) +
scale_x_continuous(breaks = as.numeric(levels(df$Year))[c(TRUE, rep(FALSE, 19))])
Upvotes: 6
Reputation: 174778
Year
is a factor and as such ggplot()
will interpret that accordingly and produce a dotplot. The reason geom_line()
doesn't do anything as this geom doesn't make sense for the data supplied; the factor nature indicates to ggplot()
that the x-axis is not continuous and there is nothing to draw between points on that axis, hence no lines.
That this is the case is clearly shown by the figure you get with geom_line()
after converting Year
to a numeric variable. Now ggplot()
, following its grammar, produces a line chart for the continuous x-axis data.
So now your question boils down to controlling the scale on the x-axis (scale is what ggplot()
calls the axis). I see two options;
scale_x_continous()
as documented hereYear
numeric data to a Date
object and let ggplot()
handle the scale or customise it via scale_x_date()
, as documented and illustrated hereTo convert to a Date
object you could do something like this:
dfccomp <- transform(dfccomp,
Year = as.Date(paste(Year, "01", "01", sep = "-")))
alter the two "01"
s to be whatever month (the first "01"
) or day of month you want, but whatever you choose it is in effect arbitrary and not required; that data points will be 1 year apart.
You can then use the minor_breaks
argument in scale_x_date()
to control how many or where minor ticks are shown, plus the breaks
argument to set which years are shown. I suggest you don't show all years otherwise the resulting plot will be a mess. You also don;t need each year as a minor break as agin the grid lines will just swamp the plot.
Upvotes: 6