baxx
baxx

Reputation: 4725

Organise x ticks on plot sensibly, ggplot 2

I have this plot from the provided test code:

enter image description here

I would like for the x ticks to be organised sensibly (seeing the image further on of the plot created using the original data highlights the problem).

Here is some code that can be used as an example :

## Create some numbers for testing

set.seed(123)
Aboard <- sample(1:50,50)

## some years to use

Years <- c(1931, 1931, 1931, 1934, 1934, 1934, 1934, 1937, 1937, 1937, 1937, 1937, 1938, 1943, 1943, 1943, 1943, 1943, 1955, 1955, 1955, 1955, 1955, 1961, 1961, 1961, 1970, 1970, 1970, 1970, 1973, 1973, 1973, 1978, 1980, 1980, 1982, 1982, 1983, 1984, 1984, 1985, 1986, 1986, 1986, 1987, 1987, 1989, 1990, 1990)

df <- data.frame(Aboard, Years)

###############################################################################

## I WANT TO FIND THE SUM OF FOR EACH YEAR

## change years to factor variable, so that I have levels to work with.
df$Years <- factor(df$Years)

## blank vector to store sum values.
aboardYearTotal= c()


## iterate over the levels of the years vector.
for(y in levels(as.factor(df$Years))){
  ## I want to use an integer rather than a string
  y = as.numeric(y)
  ## for each level - find the sum of all Aboard values that correspond with it.
  ## I need to remove NA values as there are some.
  yy=sum(df$Aboard[df$Years==y], na.rm = TRUE)
  aboardYearTotal = c(aboardYearTotal, yy)
}

## I no longer need y, or yy
rm(y)
rm(yy)

###############################################################################

## Create plot using this variable

yearLevels <- levels(as.factor(df$Years))
aboardYears <- data.frame(yearLevels, aboardYearTotal)

## Create a plot of the data for total number aboard each year
p <- ggplot(aboardYears, aes(yearLevels, aboardYearTotal))
p + geom_point(aes(size = aboardYearTotal))

How can I control the ticks on the x axis here?

I've tried to play around with scale_x_continuous and scale_x_discrete but I can't get it to work as intended.

Ideally I will be able to choose the

For example if my start value was 0 and end value was 10, with a spacing of 2, I would have the x axis marked as:

0 2 4 6 8 10

Here is the original plot which highlights the problem I'm having with the x axis :

enter image description here

I'm open to suggestions or advice for better practices general.

Upvotes: 2

Views: 93

Answers (1)

eipi10
eipi10

Reputation: 93851

Don't convert Year to a factor. Instead, leave it numeric and use stat_summary to take care of the sum.

df <- data.frame(Aboard, Years)

ggplot(df, aes(Years, Aboard)) +
  stat_summary(fun.y=sum, geom="point", aes(size=..y..))

ggplot will pick sensible defaults for the x-axis labels, but you can change these as well. For example:

ggplot(df, aes(Years, Aboard)) +
  stat_summary(fun.y=sum, geom="point", aes(size=..y..)) +
  scale_x_continuous(breaks=seq(1920, 2020, 20))

enter image description here

You can set the x-axis breaks to be at whatever values you want by providing a vector of those values. For example:

scale_x_continuous(breaks=seq(min(df$Years), max(df$Years)+6, 6))

or

scale_x_continuous(breaks=c(1931, 1955))

Sometimes, you'll need or want to perform data summary operations outside of ggplot. There are a number of options. Here are a couple:

Base R

df.summary = aggregate(Aboard ~ Years, df, sum)

tidyverse

library(tidyverse)

df.summary = df %>%
  group_by(Years) %>% 
  summarise(Aboard = sum(Aboard))

You can even do this on the fly when you plot the data, without the need to create a separate summary data frame. For example:

ggplot(aggregate(Aboard ~ Years, df, sum), aes(Years, Aboard, size=Aboard)) +
  geom_point()

or

df %>%
  group_by(Years) %>% 
  summarise(Aboard = sum(Aboard)) %>% 
  ggplot(aes(Years, Aboard, size=Aboard)) +
    geom_point()

Upvotes: 3

Related Questions