Reputation: 2047
Let's say I have a vector of length 5. The contents could be anything.
> v1 <- c(0,0,0,0,0)
> length(v1)
[1] 5
And let's say I want to create vector of the same length, with equally distributed values ranging from 0 to 100, as follows:
> v2 <- c(0,25,50,75,100)
> length(v2)
[1] 5
This example was pretty straightforward, but now I would like to write a function that would allow me to do this for a vector of any length. Here is the code I have written:
percentile <- function(N) {
l <- length(N)
v <- 0
i = 0
while (as.integer(i) < 100) {
i = i + 100/(l - 1)
v <- c(v, i)
}
v
}
If I try it on the vector v1 described above, the results are as expected:
> percentile(v1)
[1] 0 25 50 75 100
However, things go weird for more "complicated" lengths, for instance if I try it on a vector v3 of length 1357:
> v3 <- c(1:1357)
> length(v3)
[1] 1357
>
> length(percentile(v3))
[1] 1358
First of all, the expected result vector is too long in some cases. Depending on the length of the initial vector, its length can be in excess of up to one or two elements. This does not seem to depend on how big the number is. And in these cases, the last element of the percentile() vector is always bigger than 100:
> percentile(v3)
[1] 0.00000000 0.7374631 0.14749263 0.22123894 0.29498525
.......
[1356] 99.92625369 100.0000000 100.07374631
Is there something messy with my handling of floats/integers? How can I improve my function so that it will work with vectors of any length? Any help is appreciated.
Upvotes: 1
Views: 3679
Reputation: 21532
Take a look at seq
. You can specify the increment, or the spacing, or the number of elements of your desired sequence. As a simple example:
Rgames> seq(0,100,length=5)
[1] 0 25 50 75 100
Rgames> seq(0,100,length=37)
[1] 0.000000 2.777778 5.555556 8.333333 11.111111 13.888889
[7] 16.666667 19.444444 22.222222 25.000000 27.777778 30.555556
[13] 33.333333 36.111111 38.888889 41.666667 44.444444 47.222222
[19] 50.000000 52.777778 55.555556 58.333333 61.111111 63.888889
[25] 66.666667 69.444444 72.222222 75.000000 77.777778 80.555556
[31] 83.333333 86.111111 88.888889 91.666667 94.444444 97.222222
[37] 100.000000
Upvotes: 2
Reputation: 89097
Yes, most likely a floating point issue. This should do it:
percentile <- function(N) seq(from = 0, to = 100, length.out = length(N))
Indeed:
length(v3)
# [1] 1357
length(percentile(v3))
# [1] 1357
Upvotes: 3