Federico Giorgi
Federico Giorgi

Reputation: 10735

Reduce total sum of vector elements in R

in R, I have a vector of integers. From this vector, I would like to reduce the value of each integer element randomly, in order to obtain a sum of the vector that is a percentage of the initial sum.

In this example, I would like to reduce the vector "x" to a vector "y", where each element has been randomly reduced to obtain a sum of the elements equal to 50% of the initial sum.

The resulting vector should have values that are non-negative and below the original value.

set.seed(1)
perc<-50            
x<-sample(1:5,10,replace=TRUE)
xsum<-sum(x) # sum is 33
toremove<-floor(xsum*perc*0.01)
x # 2 2 3 5 2 5 5 4 4 1

y<-magicfunction(x,perc)
y # 0 2 1 4 0 3 2 1 2 1
sum(y) # sum is 16 (rounded half of 33)

Can you think of a way to do it? Thanks!

Upvotes: 4

Views: 598

Answers (3)

Federico Giorgi
Federico Giorgi

Reputation: 10735

An alternative solution is this function, which downsamples the original vector by a random fraction proportional to the vector element size. Then it checks that elements don't fall below zero, and iteratively approaches an optimal solution.

removereads<-function(x,perc=NULL){
xsum<-sum(x)
toremove<-floor(xsum*perc)
toremove2<-toremove
irem<-1
while(toremove2>(toremove*0.01)){
    message("Downsampling iteration ",irem)
    tmp<-sample(1:length(x),toremove2,prob=x,replace=TRUE)
    tmp2<-table(tmp)
    y<-x
    common<-as.numeric(names(tmp2))
    y[common]<-x[common]-tmp2
    y[y<0]<-0
    toremove2<-toremove-(xsum-sum(y))
    irem<-irem+1
}
return(y)
}
set.seed(1)
x<-sample(1:1000,10000,replace=TRUE)
perc<-0.9
y<-removereads(x,perc)
plot(x,y,xlab="Before reduction",ylab="After reduction")
abline(0,1)

And the graphical results: Downsampling R vector

Upvotes: 3

mickey
mickey

Reputation: 2188

Here's a solution which uses draws from the Dirichlet distribution:

set.seed(1)
x = sample(10000, 1000, replace = TRUE)

magic = function(x, perc, alpha = 1){
    # sample from the Dirichlet distribution
    # sum(p) == 1
    # lower values should reduce by less than larger values
    # larger alpha means the result will have more "randomness"
    p = rgamma(length(x), x / alpha, 1)
    p = p / sum(p)

    # scale p up an amount so we can subtract it from x
    # and get close to the desired sum
    reduce = round(p * (sum(x) - sum(round(x * perc))))
    y = x - reduce

    # No negatives
    y = c(ifelse(y < 0, 0, y))

    return (y)
    }

alpha = 500
perc = 0.7
target = sum(round(perc * x))
y = magic(x, perc, alpha)

# Hopefully close to 1
sum(y) / target
> 1.000048

# Measure of the "randomness"
sd(y / x)
> 0.1376637

Basically, it tries to figure out how much to reduce each element by while still getting close to the sum you want. You can control how "random" you want the new vector by increasing alpha.

Upvotes: 1

Julius Vainora
Julius Vainora

Reputation: 48211

Assuming that x is long enough, we may rely on some appropriate law of large numbers (also assuming that x is regular enough in certain other ways). For that purpose we will generate values of another random variable Z taking values in [0,1] and with mean perc.

set.seed(1)
perc <- 50 / 100
x <- sample(1:10000, 1000)
sum(x)
# [1] 5014161
x <- round(x * rbeta(length(x), perc / 3 / (1 - perc), 1 / 3))
sum(x)
# [1] 2550901
sum(x) * 2
# [1] 5101802
sum(x) * 2 / 5014161 
# [1] 1.017479 # One percent deviation

Here for Z I chose a certain beta distribution giving mean perc, but you could pick some other too. The lower the variance, the more precise the result. For instance, the following is much better as the previously chosen beta distribution is, in fact, bimodal:

set.seed(1)
perc <- 50 / 100
x <- sample(1:1000, 100)
sum(x)
# [1] 49921
x <- round(x * rbeta(length(x), 100 * perc / (1 - perc), 100))
sum(x)
# [1] 24851
sum(x) * 2
# [1] 49702
sum(x) * 2 / 49921
# [1] 0.9956131 # Less than 0.5% deviation!

Upvotes: 5

Related Questions