triub
triub

Reputation: 95

Detect ranges from string of numbers

There is a vector I have which looks like this

c(3,4,5,6,7,10,11,14,17,18,19,54,55,56,59,61)->k

how can I easily detect ranges of consecutive numbers so that

3:7,10,11,14,17:19,54:56,59,61 

and save that in a new vector? In those cases in which there is a range (:) it would be good to have the median of this range save so the output would be

5,10,11,14,18,55,59,61

is there any other solution quick and can also handle vectors which are not ascending like this 12,3,4,5,0,7

into 12,4,0,7

Upvotes: 2

Views: 257

Answers (2)

akrun
akrun

Reputation: 887118

An option using vapply and range (only base R functions)

 f1 <- function(x) paste(unique(range(x)), collapse=":")
 vapply(split(k, cumsum(c(TRUE,diff(k)!=1))), f1, character(1L)) 
 #   1       2       3       4       5       6       7 
 # "3:7" "10:11"    "14" "17:19" "54:56"    "59"    "61" 

Or if you need median

vapply(split(k, cumsum(c(TRUE,diff(k)!=1))), FUN= median, double(1L))
#   1    2    3    4    5    6    7 
# 5.0 10.5 14.0 18.0 55.0 59.0 61.0 

For big vectors, as @David Arenburg mentioned in the comments, some data.table options are

 library(data.table)
 as.data.table(k)[, median(k), cumsum(c(TRUE, diff(k) != 1))]
 as.data.table(k)[, paste(unique(range(k)), collapse = ";"), 
               cumsum(c(TRUE, diff(k) != 1))

Update

Using the new vector "k1"

 k1 <- c(12,3,4,5,0,7)
  vapply(split(k1, cumsum(c(TRUE, diff(k1)!=1))), FUN=median, 
              double(1L))
  # 1  2  3  4 
  #12  4  0  7 
as.data.table(k1)[, median(k1) ,cumsum(c(TRUE, diff(k1)!=1))]
#    cumsum V1
# 1:      1 12
# 2:      2  4
# 3:      3  0
# 4:      4  7

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269624

1) Try this:

tapply(k, cumsum(c(TRUE, diff(k) != 1)), median)

giving:

   1    2    3    4    5    6    7 
 5.0 10.5 14.0 18.0 55.0 59.0 61.0 

2) Also try this:

f <- function(x) if (length(x) == 1) x else paste(x[1], x[length(x)], sep = ":")
tapply(k, cumsum(c(TRUE, diff(k) != 1)), f)

giving:

      1       2       3       4       5       6       7 
  "3:7" "10:11"    "14" "17:19" "54:56"    "59"    "61" 

3) and this:

tapply(k, cumsum(c(TRUE, diff(k) != 1)), toString)

giving this:

              1               2               3               4               5 
"3, 4, 5, 6, 7"        "10, 11"            "14"    "17, 18, 19"    "54, 55, 56" 
              6               7 
           "59"            "61" 

4) and this:

split(k, cumsum(c(TRUE, diff(k) != 1)))

giving:

$`1`
[1] 3 4 5 6 7

$`2`
[1] 10 11

$`3`
[1] 14

$`4`
[1] 17 18 19

$`5`
[1] 54 55 56

$`6`
[1] 59

$`7`
[1] 61

None of the above require any external packages.

Upvotes: 5

Related Questions