dper
dper

Reputation: 904

Generate data based on exponential distribution

I want to generate a dataset of 30 entries such between the range of (50-5000) such that it follows an increasing curve(log curve) i.e. increasing in the start and then stagnant in the end.

I came across the from scipy.stats import expon but I am not sure how to use the package in my scenario.

Can anyone help.

A possible output would look like [300, 1000, 1500, 1800, 1900, ...].

Upvotes: 1

Views: 247

Answers (1)

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

First you need to generate 30 random x values (uniformly). Then you get log(x). Ideally, log(x) should be in range [50, 5000). However, in such case you would need e^50 <= x <= e^5000 (overflow!!). A possible solution is to generate random x values in [min_x, max_x), get the logarithmic values and then scale them to the desired range [50, 5000).

import numpy as np

min_y = 50
max_y = 5000
min_x = 1
# any number max_x can be chosen
# this number controls the shape of the logarithm, therefore the final distribution
max_x = 10

# generate (uniformly) and sort 30 random float x in [min_x, max_x)
x = np.sort(np.random.uniform(min_x, max_x, 30))
# get log(x), i.e. values in [log(min_x), log(max_x))
log_x = np.log(x)
# scale log(x) to the new range [min_y, max_y)
y = (max_y - min_y) * ((log_x - np.log(min_x)) / (np.log(max_x) - np.log(min_x))) + min_y

Upvotes: 1

Related Questions