Reputation: 41
I'm given arrays of numbers between 1 and 4, but usually they don't differ more than .5 between the min and max. The difference between each element is no smaller than .1. I want to find the smallest margin that contains at least 90% (or some other specified rate) of the elements.
That is, given the array
c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
I want my function to return .4 because 2.3 - 1.9 = .4 < 2.3 - 1 = 1.3. Details:
I tried to build the function a few times, but it keeps growing overly complicated, and I'm wondering if there's a simple way to do this that I haven't considered.
Edit: it has to be able to satisfy skewed distributions. I don't have any completed examples of code I produced since I keep reconstructing it, but I'll make something and post it.
Edit2: I can't provide any examples of the arrays I want to feed into function, but Here's a function for generating similar values. It's not important that it doesn't fall in the 1 to 4 range as long as it works.
x = round(rbeta(20,5,2)*100)/10
Upvotes: 3
Views: 1096
Reputation: 66819
Here's one way (same as @Aaron's except head
/tail
instead of x[i]
):
x = c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
xn= length(x)
# number of elements to drop
n = round(0.1*xn)
# achievable ranges
v = tail(x, n+1) - head(x, n+1)
min(v)
# [1] 0.4
Confirmation that a subvector of x dropping n elements really has this range:
n_up = which.min(v) - 1
n_dn = n-n_up
xs = x[(1 + n_up):(xn - n_dn)]
diff(range(xs))
# [1] 0.4
length(x) - length(xs) == n
# [1] TRUE
Testing on new example:
set.seed(1)
x0 = round(rbeta(20,5,2)*100)/10
x = sort(x0)
xn= length(x)
n = round(0.1*xn)
v = tail(x, n+1) - head(x, n+1)
min(v)
# [1] 4.1
# confirm...
n_up = which.min(v) - 1
n_dn = n-n_up
xs = x[(1 + n_up):(xn - n_dn)]
diff(range(xs))
# [1] 4.1
length(x) - length(xs) == n
# [1] TRUE
Partial sorting might be sufficient (just to get the top and bottom values on the ends); see ?sort
.
Upvotes: 4
Reputation: 37764
The easiest way will be to brute force by testing all possible ranges that include 90%. To do this, we figure out how many terms that is, and what indices the ranges therefore can start at, and compute the difference for each, and then the minimum of those.
x <- c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
n <- ceiling(length(x)*0.9) # get the number of terms needed to include 90%
k <- 1 : (length(x) - n + 1) # get the possible indices the range can start at
x <- sort(x) # need them sorted...
d <- x[k + n - 1] - x[k] # get the difference starting at each range
min(d) # get the smallest difference
Upvotes: 5
Reputation: 76470
This can be solved with quantile
.
0.05
and 0.95
quantiles. x
that are within those limits. Call this vector in_90
. in_90
.The sequence of instructions would be this.
qq <- quantile(x, c(0.05, 0.95))
in_90 <- x[qq[1] <= x & x <= qq[2]]
diff(range(in_90))
#[1] 0.4
As a function:
amplitude <- function(x, conf = 0.9){
quants <- c((1 - conf)/2, 1 - (1 - conf)/2)
qq <- quantile(x, quants)
inside <- x[qq[1] <= x & x <= qq[2]]
diff(range(inside))
}
amplitude(x)
#[1] 0.4
Data.
x <- c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
Upvotes: 1