Reputation: 5919
I have a chart where I am charting some very large numbers, in the millions. My audience is unlikely to understand scientific notation, so I'm hoping to label the y axis in something like "2M" for two million for example.
Here's an example. Showing the full value (scales::comma
) is better than the scientific notation it defaults to, but is still a bit busy:
library(ggplot2)
ggplot(as.data.frame(list(x = c(0, 200,100), y = c(7500000,10000000,2000000))),
aes(x = x, y = y)) +
geom_point() +
expand_limits( x = c(0,NA), y = c(0,NA)) +
scale_y_continuous(labels = scales::comma)
I don't want to rescale the data, since I will be including labels with the values of the individual data points as well.
Upvotes: 38
Views: 49561
Reputation: 1
The addUnits function is great! I've been using it a lot, but I recently noticed it sometimes fails with y-axis scales that transition from thousands into millions, or millions into billions, or billions into trillions.
This is due to rounding. For instance, you may get the following values across your y-axis: (200k, 400k, 600k, 800k, 1M, 1M), where the second 1M should really be 1.2M.
I made the following adjustment, and just wanted to share in case it helps anyone else. There's probably a more elegant way to do it, but this works for me.
addUnits <- function(n) {
labels <- ifelse(n < 1000, n, # less than thousands
ifelse(n < 1e6, paste0(round(n/1e3), 'k'), # in thousands
ifelse(n < 1e9,
ifelse(round(n/1e6)==round(n/1e6,digits=1),
paste0(round(n/1e6), 'M'), # in millions
paste0(round(n/1e6,digits=1), 'M')), # in 1.x millions
ifelse(n < 1e12,
ifelse(round(n/1e9)==round(n/1e9,digits=1),
paste0(round(n/1e9), 'B'), # in billions
paste0(round(n/1e9,digits=1), 'B')), # in 1.x billions
ifelse(n < 1e15,
ifelse(round(n/1e12)==round(n/1e12,digits=1),
paste0(round(n/1e12), 'T'), # in trillions
paste0(round(n/1e12,digits=1), 'T')), # in 1.x trillions
'too big!'
)))))
return(labels)
}
Upvotes: 0
Reputation: 515
As in many other situations when using ggplot2, I think the simplest way to do this is to manipulate the data before passing it to the ggplot() function. I would just create a new data column with the values in millions, like this:
library(dplyr)
library(ggplot2)
df <- data.frame(x = c(0, 200,100),
y = c(7500000,10000000,2000000)) %>%
mutate(y_millions = y/1e6)
ggplot(df,
aes(x = x,
y = y_millions)) +
geom_point() +
labs(y = "y (in millions)")
Upvotes: 4
Reputation: 821
In the scales package, the function label_number_si()
automatically scales and labels with the best SI prefix, "K" for values ≥ 10e3, "M" for ≥ 10e6, "B" for ≥ 10e9, and "T" for ≥ 10e12.
See here
So:
library(ggplot2)
ggplot(as.data.frame(list(x = c(0, 200,100), y = c(7500000,10000000,2000000))),
aes(x = x, y = y)) +
geom_point() +
expand_limits(x = c(0, NA), y = c(0,NA)) +
scale_y_continuous(labels = scales::label_number_si())
Upvotes: 34
Reputation: 28331
I think you can just manually set your labels
& breaks
library(ggplot2)
ylab <- c(2.5, 5.0, 7.5, 10)
ggplot(as.data.frame(list(x = c(0, 200, 100), y = c(7500000, 10000000, 2000000))),
aes(x = x, y = y)) +
geom_point() +
expand_limits(x = c(0, NA), y = c(0, NA)) +
scale_y_continuous(labels = paste0(ylab, "M"),
breaks = 10^6 * ylab
)
Edit: add a more generic solution
# Ref: https://5harad.com/mse125/r/visualization_code.html
addUnits <- function(n) {
labels <- ifelse(n < 1000, n, # less than thousands
ifelse(n < 1e6, paste0(round(n/1e3), 'k'), # in thousands
ifelse(n < 1e9, paste0(round(n/1e6), 'M'), # in millions
ifelse(n < 1e12, paste0(round(n/1e9), 'B'), # in billions
ifelse(n < 1e15, paste0(round(n/1e12), 'T'), # in trillions
'too big!'
)))))
return(labels)
}
ggplot(as.data.frame(list(x = c(0, 200, 100, 250, 300),
y = c(500000, 1000000, 200000, 90000, 150000))),
aes(x = x, y = y)) +
geom_point() +
expand_limits(x = c(0, NA), y = c(0, NA)) +
scale_y_continuous(labels = addUnits)
Created on 2018-10-01 by the reprex package (v0.2.1.9000)
Upvotes: 23
Reputation: 771
I find scales::unit_format()
to be more readable:
library(dplyr)
library(scales)
library(ggplot2)
as.data.frame(
list(x = c(0, 200, 100),
y = c(7500000, 10000000, 2000000))) %>%
ggplot(aes(x, y)) +
geom_point() +
expand_limits(x = c(0, NA), y = c(0, NA)) +
scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6))
Upvotes: 67
Reputation: 18585
Worth adding that function past to scales can create labels without specifying breaks
argument. As stated in ?scale_y_continuous
, labels can take:
One of:
- NULL for no labels waiver() for the default labels computed by the transformation object
- A character vector giving labels (must be same length as breaks)
- A function that takes the breaks as input and returns labels as output
Creating sample function is trivial:
(function(l) {paste0(round(l/1e6,1),"m")})(5e6)
"5m"
Hence the solution could be:
ggplot(as.data.frame(list(x = c(0, 200,100), y = c(7500000,10000000,2000000))),
aes(x = x, y = y)) +
geom_point() +
expand_limits( x = c(0,NA), y = c(0,NA)) +
scale_y_continuous(labels = function(l) {
paste0(round(l/1e6,1),"m")
})
There is no need to specify breaks
argument.
In the UK we tend to use small m.
Upvotes: 10