Alnitak
Alnitak

Reputation: 2489

Numpy: how to generate a random noisy curve resembling a "training curve"

I'd like to know how I can generate some random data whose plot resembles a "training curve." By training curve, I mean an array of training loss values from a learning model. These typically have larger values and variance at the beginning, and over time converge to some value with very little variance. It looks a bit like a noisy exponential curve.

This is the closest I've gotten to making random data that resembles a training curve. The problems are that the curve does not flatten out or converge like true loss curves, and there is too much variance on the flatter part.

import numpy as np
import matplotlib.pyplot as plt

num_iters = 2000
rand_curve = np.sort(np.random.exponential(size=num_iters))[::-1]
noise  = np.random.normal(0, 0.2, num_iters)
signal = rand_curve + noise
noisy_curve = signal[signal > 0]
plt.plot(noisy_curve, c='r', label='random curve')

random curve

And here is an actual training loss curve for reference.

true curve

I do not know enough about probability distributions to know if this is a stupid question. I only wanted to generate a random curve so that others had a data array to work with to help me with another question I have about logarithmic plots in matplotlib.

Upvotes: 2

Views: 2566

Answers (2)

sometimesiwritecode
sometimesiwritecode

Reputation: 3213

Seems like you could add a dampener to the noise value that is proportional to how far along the x axis that given value is. This would mean, in this case, the variance would decrease the flatter the curve got. Something like:

import numpy as np
import matplotlib.pyplot as plt

num_iters = 2000
rand_curve = np.sort(np.random.exponential(size=num_iters))[::-1]
noise  = np.random.normal(0, 0.2, num_iters)

index = 0
for noise_value in  np.nditer(noise):
    noise[index] = noise_value - index 
    index = index + 1

signal = rand_curve + noise
noisy_curve = signal[signal > 0]
plt.plot(noisy_curve, c='r', label='random curve')

Thus I think the noise values should be lower the further along X you go and it should achieve the result you want!

Upvotes: 0

sega_sai
sega_sai

Reputation: 8538

Here is the illustration how to do it with gamma distribution for the noise

x = np.arange(2000)
y = 0.00025 + 0.001 * np.exp(-x/100.) + scipy.stats.gamma(3).rvs(len(x))*(1-np.exp(-x/100))*2e-5

You can adjust the parameters here, to reduce the amount of noise etc

enter image description here

Upvotes: 1

Related Questions