user33484
user33484

Reputation: 568

R - cut function with custom labels?

Example:

> x <- c(1e6,2e6,500e3,10e6,233e3,50e3)
> cut(x,c(0,250e3,500e3,1e6,2.5e6,99e6),dig.lab=10)
[1] (500000,1000000]   (1000000,2500000]  (250000,500000]    (2500000,99000000] (0,250000]        
[6] (0,250000]        
Levels: (0,250000] (250000,500000] (500000,1000000] (1000000,2500000] (2500000,99000000]

Is there a way I can use the cut function with custom labels? I would like it to show labels such as: (0,250k],(250k,500k],(500k,1m],(1m,2.5m],(2.5m,inf]

Is there a way to do this?

Upvotes: 1

Views: 1642

Answers (1)

r2evans
r2evans

Reputation: 160447

As @user20650 said, labels= will allow you to do it manually (and for that, it is a dupe question of Adding labels to cut function in R).

If you want a programmatic way to convert levels to this SI (or IEC) format, we can borrow from utils:::format.object_size and define our own function:

#' Kilo, Mega, Giga
#'
#' Convert numbers to SI or IEC format.
#' 
#' @param x numeric
#' @param standard one of "SI" (1000-based) or "IEC" (1024)
#' @param digits number of significant digits to round to
#' @param sep character, what to put between the number and the letter
#' @param suffix character what to put immediately after the K/M/G
#'   (e.g., "b" or "B")
#' @return character
#' @export
KMG <- function(x, standard = "SI", digits = 1L, sep = "", suffix = "") {
  known_bases <- c(legacy = 1024, IEC = 1024, SI = 1000)
  known_units <- list(SI = c("", "k", "M", "G", "T", "P", 
      "E", "Z", "Y"), IEC = c("", "Ki", "Mi", "Gi", 
      "Ti", "Pi", "Ei", "Zi", "Yi"), legacy = c("", "K", 
      "M", "G", "T", "P"))
  standard <- match.arg(standard, c("auto", names(known_bases)))
  if (is.null(digits)) 
      digits <- 1L
  base <- known_bases[[standard]]
  units_map <- known_units[[standard]]
  powers <- rep(0L, length(x))
  powers[x > 0] <- pmin(as.integer(log(x[x > 0], base = base)), length(units_map) - 1L)
  units <- paste0(units_map[powers + 1L], suffix)
  paste(round(x/base^powers, digits = digits), units, sep = sep)
}

(Edited to vectorize the function and remove "LEGACY", and add a suffix= argument.)

Its use:

x <- c(1e6,2e6,500e3,10e6,233e3,50e3)

cuts <- cut(x, c(0,250e3,500e3,1e6,2.5e6,99e6), dig.lab=10)
lvls <- levels(cuts)
lvls
# [1] "(0,250000]"         "(250000,500000]"    "(500000,1000000]"   "(1000000,2500000]"  "(2500000,99000000]"

gre <- gregexpr("[[:digit:].]+", lvls)
regmatches(lvls, gre) <- 
  lapply(regmatches(lvls, gre), function(lvl) KMG(as.numeric(lvl)))
lvls
# [1] "(0,250k]"    "(250k,500k]" "(500k,1M]"   "(1M,2.5M]"   "(2.5M,99M]" 
levels(cuts) <- lvls
cuts
# [1] (500k,1M]   (1M,2.5M]   (250k,500k] (2.5M,99M]  (0,250k]    (0,250k]   
# Levels: (0,250k] (250k,500k] (500k,1M] (1M,2.5M] (2.5M,99M]

Upvotes: 3

Related Questions