Some_acctg_guy
Some_acctg_guy

Reputation: 105

Creating fake data in Python

I am trying to create a function that creates fake data to use in a separate analysis. Here are the requirements for the function.

Problem 1

In this problem you will create fake data using numpy. In the cell below the function create_data takes in 2 parameters "n" and "rand_gen.

Here is the function I have created.

def create_data(n, rand_gen):
'''
Creates a numpy array with n samples from the standard normal distribution

Parameters
-----------
n : integer for the number of samples to create
rand_gen : pseudo-random number generator from numpy  

Returns
-------
numpy array from the standard normal distribution of size n
'''

numpy_array = np.random.randn(n)
return numpy_array

Here is the first test I run on my function.

create_data(10, np.random.RandomState(seed=23))

I need the output to be this exact array.

[0.66698806, 0.02581308, -0.77761941, 0.94863382, 0.70167179,
                       -1.05108156, -0.36754812, -1.13745969, -1.32214752,  1.77225828]

My output is still completely random and I do not fully understand what the RandomState call is trying to do with the seed to create the above array rather than have it be completely random. I know I need to use the rand_gen variable in my function, but I do not know where and I think it's because I just don't understand what it is trying to do.

Upvotes: 1

Views: 1879

Answers (2)

Charles Merriam
Charles Merriam

Reputation: 20530

I think the question you are asking is about pseudo-random numbers and reproducible randoms.

Real random numbers are made with real-word unpredictable data, like watching lava lamps, while pseudo-random numbers create a long sequence of numbers that appears random.

The basic algorithm is:

  1. get a seed, or a big number, maybe from the current clock time.
  2. take part of the seed as the random number
  3. do unspeakable mathematical mutilations to the seed involving bit-shifts, exponents, and multiplications.
  4. use the output of these calculations as the new seed, go to step 2.

The trick is that specifying the same seed means you get the same sequence every time. You can set this with numpy.random.seed() and then get the same sequence each time.

I hope this is the question you were asking.

Upvotes: 1

Some_acctg_guy
Some_acctg_guy

Reputation: 105

Define numpy_array = rand_gen.randn(n)

Upvotes: 1

Related Questions