user7867665
user7867665

Reputation: 882

How to generate random numbers at the tails of an exponential distribution?

I want to generate random numbers like from np.random.exponential but clipped / truncated at values a,b. For example, if a=100, b=500 then I want the function to generate random numbers following e^(-x) in the range [100, 500].

An inefficient way would be:

rands = np.random.exponential(size=10**7)
rands = rands[(rands>a) and (rands<b)]

Is there an existing package that can do this for me? Ideally for various distributions, not just exponential.

Upvotes: 2

Views: 1392

Answers (1)

lifezbeautiful
lifezbeautiful

Reputation: 1337

If we clip the values after using the exponential generator, there are two problems with approach proposed in the question.

  • First, we lose values (For example, if we wanted 10**7 values, we might only get 10^6 values)
  • Second, np.random.exponential() returns values between 0 and 1, so we can't simply use 100 and 500 as the lower and upper bounds. We must scale the generated random numbers before scaling.

I wrote the workaround using exp(uniform). I tested your solution using smaller values of a and b (so that we don't get empty arrays). A timed approach shows this is faster by around 50%

import time
import numpy as np
import matplotlib.pyplot as plt

   

def truncated_exp_OP(a,b, how_many):
    rands = np.random.exponential(size=how_many)
    rands = rands[(rands>a) & (rands<b)]
    return rands


def truncated_exp_NK(a,b, how_many):
    a = -np.log(a)
    b = -np.log(b)
    rands = np.exp(-(np.random.rand(how_many)*(b-a) + a))
    return rands

timeTakenOP = []
for i in range(20):
    startTime = time.time()
    r = truncated_exp_OP(0.001,0.39, 10**7)
    endTime = time.time() 
    timeTakenOP.append(endTime - startTime)
print ("OP solution: ", np.mean(timeTakenOP))
plt.hist(r.flatten(), 300);
plt.show()



timeTakenNK = []
for i in range(20):
    startTime = time.time()
    r = truncated_exp_NK(100,500, 10**7)
    endTime = time.time()
    timeTakenNK.append(endTime - startTime)
print ("NK solution: ", np.mean(timeTakenNK))
plt.hist(r.flatten(), 300);
plt.show()

Average run time :

OP solution:  0.28491891622543336 vs 
NK solution:  0.1437338709831238

The histogram plots of the random numbers are shown below:

OP's approach:

enter image description here

This approach:

enter image description here

Upvotes: 1

Related Questions