Margaret
Margaret

Reputation: 5919

Display an axis value in millions in ggplot

I have a chart where I am charting some very large numbers, in the millions. My audience is unlikely to understand scientific notation, so I'm hoping to label the y axis in something like "2M" for two million for example.

Here's an example. Showing the full value (scales::comma) is better than the scientific notation it defaults to, but is still a bit busy:

library(ggplot2)
ggplot(as.data.frame(list(x = c(0, 200,100), y = c(7500000,10000000,2000000))), 
       aes(x = x, y = y)) +
  geom_point() +
  expand_limits( x = c(0,NA), y = c(0,NA)) +
  scale_y_continuous(labels = scales::comma)

enter image description here

I don't want to rescale the data, since I will be including labels with the values of the individual data points as well.

Upvotes: 38

Views: 49561

Answers (6)

SMM
SMM

Reputation: 1

The addUnits function is great! I've been using it a lot, but I recently noticed it sometimes fails with y-axis scales that transition from thousands into millions, or millions into billions, or billions into trillions.

This is due to rounding. For instance, you may get the following values across your y-axis: (200k, 400k, 600k, 800k, 1M, 1M), where the second 1M should really be 1.2M.

I made the following adjustment, and just wanted to share in case it helps anyone else. There's probably a more elegant way to do it, but this works for me.

addUnits <- function(n) {
  labels <- ifelse(n < 1000, n,  # less than thousands
               ifelse(n < 1e6, paste0(round(n/1e3), 'k'),  # in thousands
                      ifelse(n < 1e9,
                             ifelse(round(n/1e6)==round(n/1e6,digits=1),
                                    paste0(round(n/1e6), 'M'),  # in millions
                                    paste0(round(n/1e6,digits=1), 'M')), # in 1.x millions
                             ifelse(n < 1e12,
                                    ifelse(round(n/1e9)==round(n/1e9,digits=1),
                                           paste0(round(n/1e9), 'B'), # in billions
                                           paste0(round(n/1e9,digits=1), 'B')), # in 1.x billions
                                    ifelse(n < 1e15,
                                           ifelse(round(n/1e12)==round(n/1e12,digits=1),
                                                  paste0(round(n/1e12), 'T'), # in trillions
                                                  paste0(round(n/1e12,digits=1), 'T')), # in 1.x trillions
                                           'too big!'
                                    )))))
 return(labels)
}

Upvotes: 0

Nayef
Nayef

Reputation: 515

As in many other situations when using ggplot2, I think the simplest way to do this is to manipulate the data before passing it to the ggplot() function. I would just create a new data column with the values in millions, like this:

library(dplyr)
library(ggplot2)

df <- data.frame(x = c(0, 200,100),
                 y = c(7500000,10000000,2000000)) %>% 
    mutate(y_millions = y/1e6)

ggplot(df, 
       aes(x = x, 
           y = y_millions)) + 
    geom_point() + 
    labs(y = "y (in millions)")

enter image description here

Upvotes: 4

vinnief
vinnief

Reputation: 821

In the scales package, the function label_number_si() automatically scales and labels with the best SI prefix, "K" for values ≥ 10e3, "M" for ≥ 10e6, "B" for ≥ 10e9, and "T" for ≥ 10e12. See here

So:

library(ggplot2)
ggplot(as.data.frame(list(x = c(0, 200,100), y = c(7500000,10000000,2000000))), 
       aes(x = x, y = y)) +
  geom_point() +
  expand_limits(x = c(0, NA), y = c(0,NA)) +
  scale_y_continuous(labels = scales::label_number_si())

enter image description here

Upvotes: 34

Tung
Tung

Reputation: 28331

I think you can just manually set your labels & breaks

library(ggplot2)

ylab <- c(2.5, 5.0, 7.5, 10)

ggplot(as.data.frame(list(x = c(0, 200, 100), y = c(7500000, 10000000, 2000000))), 
       aes(x = x, y = y)) +
  geom_point() +
  expand_limits(x = c(0, NA), y = c(0, NA)) +
  scale_y_continuous(labels = paste0(ylab, "M"),
                     breaks = 10^6 * ylab
  )

Edit: add a more generic solution

# Ref: https://5harad.com/mse125/r/visualization_code.html
addUnits <- function(n) {
  labels <- ifelse(n < 1000, n,  # less than thousands
                   ifelse(n < 1e6, paste0(round(n/1e3), 'k'),  # in thousands
                          ifelse(n < 1e9, paste0(round(n/1e6), 'M'),  # in millions
                                 ifelse(n < 1e12, paste0(round(n/1e9), 'B'), # in billions
                                        ifelse(n < 1e15, paste0(round(n/1e12), 'T'), # in trillions
                                               'too big!'
                                        )))))
  return(labels)
}

ggplot(as.data.frame(list(x = c(0, 200, 100, 250, 300), 
                          y = c(500000, 1000000, 200000, 90000, 150000))), 
       aes(x = x, y = y)) +
  geom_point() +
  expand_limits(x = c(0, NA), y = c(0, NA)) +
  scale_y_continuous(labels = addUnits)

Created on 2018-10-01 by the reprex package (v0.2.1.9000)

Upvotes: 23

Jo&#227;o
Jo&#227;o

Reputation: 771

I find scales::unit_format() to be more readable:

library(dplyr)
library(scales)
library(ggplot2)

as.data.frame(
  list(x = c(0, 200, 100), 
       y = c(7500000, 10000000, 2000000))) %>%
  ggplot(aes(x, y)) +
  geom_point() +
  expand_limits(x = c(0, NA), y = c(0, NA)) +
  scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6))

Upvotes: 67

Konrad
Konrad

Reputation: 18585

Worth adding that function past to scales can create labels without specifying breaks argument. As stated in ?scale_y_continuous, labels can take:

One of:

  • NULL for no labels waiver() for the default labels computed by the transformation object
  • A character vector giving labels (must be same length as breaks)
  • A function that takes the breaks as input and returns labels as output

Creating sample function is trivial:

(function(l) {paste0(round(l/1e6,1),"m")})(5e6)
"5m"

Hence the solution could be:

ggplot(as.data.frame(list(x = c(0, 200,100), y = c(7500000,10000000,2000000))), 
       aes(x = x, y = y)) +
    geom_point() +
    expand_limits( x = c(0,NA), y = c(0,NA)) +
    scale_y_continuous(labels = function(l) {
        paste0(round(l/1e6,1),"m")
    })

There is no need to specify breaks argument.

Results


In the UK we tend to use small m.

Upvotes: 10

Related Questions