user1665355
user1665355

Reputation: 3393

Custom spacing between x axis labels in ggplot

I have a df:

   Year          Ratio       N    Mean        sd        se        ci
97  1867 TILLBANK...PLACTILL 2  3.861999  4.082170  2.886530  36.67685
98  1867   TILLOBL..PLACTILL 2 21.848833 17.859532 12.628596 160.46153
99  1867   TILLLOAN.PLACTILL 2 54.197044 23.309360 16.482207 209.42629
100 1867   TILLEQUI.PLACTILL 2  0.000000  0.000000  0.000000   0.00000
101 1867   TILLCONT.PLACTILL 2  0.000000  0.000000  0.000000   0.00000
102 1867   TILLRECI.PLACTILL 2 10.772286  5.110514  3.613679  45.91615


str(df) :

     'data.frame':  1152 obs. of  7 variables:
 $ Year : Factor w/ 156 levels "1855","1856",..: 13 13 13 13 13 13 13 13 14 14 ...
 $ Ratio: Factor w/ 8 levels "TILLBANK...PLACTILL",..: 1 2 3 4 5 6 7 8 1 2 ...
 $ N    : num  2 2 2 2 2 2 2 2 2 2 ...
 $ Mean : num  3.86 21.85 54.2 0 0 ...
 $ sd   : num  4.08 17.86 23.31 0 0 ...
 $ se   : num  2.89 12.63 16.48 0 0 ...
 $ ci   : num  36.7 160.5 209.4 0 0 ...

1) I am doing a ggplot:

qqs<-ggplot(dfccomp, aes(x=Year, y=sd,colour=Ratio))+geom_point()+
    facet_grid(Ratio~.)+
    theme(axis.text.x  = element_text(angle=-90, hjust=0.5, size=11,colour="black"))

This plot works with geom_point() but now with geom_line(). If I use geom_point() I then get very messy x-axis with all the years (from 1867 to 2010): enter image description here

And if I use geom_line(),which does not work, I get: enter image description here

So, I wonder how it is possible to only pick some certain years to be present on the x axis?

2) The other strange thing that I dont understand is if I convert the df$Year above to numeric,

df$Year=as.numeric(as.character(df$Year))

Plot is then: enter image description here

Now, only 3 years are present on the x-axis. Which is better but still not what I want...

why does both geom_point() and geom_line() works?

Updated: On the answer below I read that "Year is a factor and as such ggplot() will interpret that accordingly and produce a dotplot. The reason geom_line() doesn't do anything as this geom doesn't make sense for the data supplied; the factor nature indicates to ggplot() that the x-axis is not continuous and there is nothing to draw between points on that axis, hence no lines.".

But I have a different plot where geom_line() works with a factor. Why is it so?

qq<-ggplot(df, aes(x=Year, y=Mean,colour=Ratio)) + 
    geom_errorbar(aes(ymin=Mean-sd, ymax=Mean+sd), colour="black", width=.1, position=position_dodge(.1)) +
    geom_line(position=position_dodge(.1)) +
    geom_point(position=position_dodge(.1), size=3, shape=21, fill="white") + # 21 is filled circle
    xlab("Year") +
    ylab("Mean (%)")+ggtitle("Ratios")+facet_grid(Ratio~.)+theme(axis.text.x  = element_text(angle=-90, hjust=0.5, size=11,colour="black"))

The picture: enter image description here

Upvotes: 2

Views: 26946

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81683

If you use Year as factor, ggplot will print a label for every factor level. You can see this in your first two plots.

If you use Year as numeric variable, ggplot will automatically select a subset of the values for the labels of the x-axis. In your third plot, the distance between two breaks is 100.

You can manually specify where to do you want the break points on the x-axis with scale_x_continuous and the argument breaks. In the example below, a the distance between the breaks is 20. Play around with the code to find the desired plot.

ggplot(df, aes(x=as.numeric(as.character(Year)), y=sd, colour=Ratio)) +
geom_point() +
facet_grid(Ratio~.) +
theme(axis.text.x  = element_text(angle=-90, hjust=0.5, size=11,colour="black")) +
scale_x_continuous(breaks = as.numeric(levels(df$Year))[c(TRUE, rep(FALSE, 19))])

Upvotes: 6

Gavin Simpson
Gavin Simpson

Reputation: 174778

Year is a factor and as such ggplot() will interpret that accordingly and produce a dotplot. The reason geom_line() doesn't do anything as this geom doesn't make sense for the data supplied; the factor nature indicates to ggplot() that the x-axis is not continuous and there is nothing to draw between points on that axis, hence no lines.

That this is the case is clearly shown by the figure you get with geom_line() after converting Year to a numeric variable. Now ggplot(), following its grammar, produces a line chart for the continuous x-axis data.

So now your question boils down to controlling the scale on the x-axis (scale is what ggplot() calls the axis). I see two options;

  1. Provide your own scale using scale_x_continous() as documented here
  2. Convert your Year numeric data to a Date object and let ggplot() handle the scale or customise it via scale_x_date(), as documented and illustrated here

To convert to a Date object you could do something like this:

dfccomp <- transform(dfccomp,
                     Year = as.Date(paste(Year, "01", "01", sep = "-")))

alter the two "01"s to be whatever month (the first "01") or day of month you want, but whatever you choose it is in effect arbitrary and not required; that data points will be 1 year apart.

You can then use the minor_breaks argument in scale_x_date() to control how many or where minor ticks are shown, plus the breaks argument to set which years are shown. I suggest you don't show all years otherwise the resulting plot will be a mess. You also don;t need each year as a minor break as agin the grid lines will just swamp the plot.

Upvotes: 6

Related Questions