Reputation: 904
I want to generate a dataset of 30 entries such between the range of (50-5000) such that it follows an increasing curve(log curve) i.e. increasing in the start and then stagnant in the end.
I came across the from scipy.stats import expon
but I am not sure how to use the package in my scenario.
Can anyone help.
A possible output would look like [300, 1000, 1500, 1800, 1900, ...]
.
Upvotes: 1
Views: 247
Reputation: 15364
First you need to generate 30 random x
values (uniformly). Then you get log(x)
. Ideally, log(x)
should be in range [50, 5000)
. However, in such case you would need e^50 <= x <= e^5000
(overflow!!). A possible solution is to generate random x values in [min_x, max_x)
, get the logarithmic values and then scale them to the desired range [50, 5000)
.
import numpy as np
min_y = 50
max_y = 5000
min_x = 1
# any number max_x can be chosen
# this number controls the shape of the logarithm, therefore the final distribution
max_x = 10
# generate (uniformly) and sort 30 random float x in [min_x, max_x)
x = np.sort(np.random.uniform(min_x, max_x, 30))
# get log(x), i.e. values in [log(min_x), log(max_x))
log_x = np.log(x)
# scale log(x) to the new range [min_y, max_y)
y = (max_y - min_y) * ((log_x - np.log(min_x)) / (np.log(max_x) - np.log(min_x))) + min_y
Upvotes: 1