biohazard
biohazard

Reputation: 2047

increment from 0 to n over the length of a vector in R

Let's say I have a vector of length 5. The contents could be anything.

> v1 <- c(0,0,0,0,0)
> length(v1)
[1] 5

And let's say I want to create vector of the same length, with equally distributed values ranging from 0 to 100, as follows:

> v2 <- c(0,25,50,75,100)
> length(v2)
[1] 5

This example was pretty straightforward, but now I would like to write a function that would allow me to do this for a vector of any length. Here is the code I have written:

percentile <- function(N) {
  l <- length(N)
  v <- 0
  i = 0
  while (as.integer(i) < 100) {
    i = i + 100/(l - 1)
    v <- c(v, i)
  }
v
}

If I try it on the vector v1 described above, the results are as expected:

> percentile(v1)
[1]   0  25  50  75 100

However, things go weird for more "complicated" lengths, for instance if I try it on a vector v3 of length 1357:

> v3 <- c(1:1357)
> length(v3)
[1] 1357
>
> length(percentile(v3))
[1] 1358

First of all, the expected result vector is too long in some cases. Depending on the length of the initial vector, its length can be in excess of up to one or two elements. This does not seem to depend on how big the number is. And in these cases, the last element of the percentile() vector is always bigger than 100:

> percentile(v3)
   [1]   0.00000000   0.7374631   0.14749263   0.22123894   0.29498525
   .......
[1356]  99.92625369 100.0000000 100.07374631

Is there something messy with my handling of floats/integers? How can I improve my function so that it will work with vectors of any length? Any help is appreciated.

Upvotes: 1

Views: 3679

Answers (2)

Carl Witthoft
Carl Witthoft

Reputation: 21532

Take a look at seq. You can specify the increment, or the spacing, or the number of elements of your desired sequence. As a simple example:

Rgames> seq(0,100,length=5)
[1]   0  25  50  75 100
Rgames> seq(0,100,length=37)
 [1]   0.000000   2.777778   5.555556   8.333333  11.111111  13.888889
 [7]  16.666667  19.444444  22.222222  25.000000  27.777778  30.555556
[13]  33.333333  36.111111  38.888889  41.666667  44.444444  47.222222
[19]  50.000000  52.777778  55.555556  58.333333  61.111111  63.888889
[25]  66.666667  69.444444  72.222222  75.000000  77.777778  80.555556
[31]  83.333333  86.111111  88.888889  91.666667  94.444444  97.222222
[37] 100.000000

Upvotes: 2

flodel
flodel

Reputation: 89097

Yes, most likely a floating point issue. This should do it:

percentile <- function(N) seq(from = 0, to = 100, length.out = length(N))

Indeed:

length(v3)
# [1] 1357
length(percentile(v3))
# [1] 1357

Upvotes: 3

Related Questions