mikeck
mikeck

Reputation: 3788

Represent a numeric vector as a set of characters using :

In R, you can define an arbitrary integer sequence using :, e.g.

a = c(1:3, 12:14)
print(a)
## 1 2 3 12 13 14

I'm looking for a way to do the inverse operation, e.g. given a vector of integers I want to produce a character (or character vector) that collapses the integer sequence(s) to the equivalent expressions using :, e.g.

some_function (a)
## "1:3" "12:14"

Bonus if the stride can be detected, e.g. c(1, 3, 5) becomes "1:2:5" or something like that.

Motivation: generate an integer sequence in R based on some data manipulation to identify database row selection, and pass the most concise representation of that sequence to an external program in the proper format.

Upvotes: 1

Views: 80

Answers (3)

Onyambu
Onyambu

Reputation: 79338

We can be able to take into consideration the rle of the differences and paste the range together taking into consideration the sequence distance.

fun=function(s){
  m=c(0,diff(s))
  b=rle(m)
  b$values[b$lengths==1&b$values!=1]=0
  l=cumsum(!inverse.rle(b))
  d=function(x)paste0(range(x[,1]),
                      collapse = paste0(":",unique(x[-1,-1]),":"))
  f=c(by(cbind(s,m),l,d))
  sub("::.*","",sub(":1:",":",f))
}   

fun(c(1,1:3,12:14,c(1,3,5)))
      1       2       3       4 
    "1"   "1:3" "12:14" "1:2:5" 
fun(c(1, 3, 5, 8:10, 14, 17, 20))
        1         2         3 
  "1:2:5"    "8:10" "14:3:20" 

fun(1)
  1 
"1" 

Upvotes: 1

user3603486
user3603486

Reputation:

Ah, nerd heaven. Here's a first shot. You could even use this for encoding within R.

Needs testing; code always prints the stride out.

encode_ranges <- function (x) {
  rle_diff <- list(
    start = x[1],
    rled   = rle(diff(x))
  )

  class(rle_diff) <- "rle_diff"
  rle_diff
}

decode_ranges <- function (x) {
  stopifnot(inherits(x, "rle_diff"))
  cumsum(c(x$start, inverse.rle(x$rled)))
}

format.rle_diff <- function (x, ...) {
  stopifnot(inherits(x, "rle_diff"))
  output <- character(length(x$rled$values))

  start <- x$start
  for (j in seq_along(x$rled$values)) {
    stride <- x$rled$values[j]
    len    <- x$rled$lengths[j]
    if (len == 1L) {
      start <- end + stride
      next
    }
    end       <- start + stride * x$rled$lengths[j]
    output[j] <- paste(start, end, stride, sep = ":")
  }

  output <- output[nchar(output) > 0]
  paste(output, collapse = ", ")
}

print.rle_diff <- function (x, ...) cat(format(x, ...))

encode_ranges(c(1:3, 12:14))
encode_ranges(c(1, 3, 5, 8:10, 14, 17, 20))

Upvotes: 1

akrun
akrun

Reputation: 887891

We create a grouping variable with diff and cumsum, then use on the group by functions to paste the range of values

f1 <- function(vec) {
  unname(tapply(vec, cumsum(c(TRUE, diff(vec) != 1)), 
       FUN = function(x) paste(range(x), collapse=":")))
  }

f1(a) 
#[1] "1:3"   "12:14"

For the second case

b <- c(1, 3, 5)
un1 <- unique(diff(c(1, 3, 5)))
paste(b[1], un1, b[length(b)], sep=":")
#[1] "1:2:5"

Upvotes: 0

Related Questions