Reputation: 2489
I'd like to know how I can generate some random data whose plot resembles a "training curve." By training curve, I mean an array of training loss values from a learning model. These typically have larger values and variance at the beginning, and over time converge to some value with very little variance. It looks a bit like a noisy exponential curve.
This is the closest I've gotten to making random data that resembles a training curve. The problems are that the curve does not flatten out or converge like true loss curves, and there is too much variance on the flatter part.
import numpy as np
import matplotlib.pyplot as plt
num_iters = 2000
rand_curve = np.sort(np.random.exponential(size=num_iters))[::-1]
noise = np.random.normal(0, 0.2, num_iters)
signal = rand_curve + noise
noisy_curve = signal[signal > 0]
plt.plot(noisy_curve, c='r', label='random curve')
And here is an actual training loss curve for reference.
I do not know enough about probability distributions to know if this is a stupid question. I only wanted to generate a random curve so that others had a data array to work with to help me with another question I have about logarithmic plots in matplotlib
.
Upvotes: 2
Views: 2566
Reputation: 3213
Seems like you could add a dampener to the noise value that is proportional to how far along the x axis that given value is. This would mean, in this case, the variance would decrease the flatter the curve got. Something like:
import numpy as np
import matplotlib.pyplot as plt
num_iters = 2000
rand_curve = np.sort(np.random.exponential(size=num_iters))[::-1]
noise = np.random.normal(0, 0.2, num_iters)
index = 0
for noise_value in np.nditer(noise):
noise[index] = noise_value - index
index = index + 1
signal = rand_curve + noise
noisy_curve = signal[signal > 0]
plt.plot(noisy_curve, c='r', label='random curve')
Thus I think the noise values should be lower the further along X you go and it should achieve the result you want!
Upvotes: 0
Reputation: 8538
Here is the illustration how to do it with gamma distribution for the noise
x = np.arange(2000)
y = 0.00025 + 0.001 * np.exp(-x/100.) + scipy.stats.gamma(3).rvs(len(x))*(1-np.exp(-x/100))*2e-5
You can adjust the parameters here, to reduce the amount of noise etc
Upvotes: 1