Dhiraj Gandhi
Dhiraj Gandhi

Reputation: 151

Normal Distribution using Numpy

I want to generate a dataset with m random data points of k dimensions each. Thus resulting in data size of shape (m, k). These points should be i.i.d. from a normal distribution with mean 0 and standard deviation 1. There are 2 ways of generating these points.

First way:

import numpy as np

# Initialize the array 
Z = np.zeros((m, k)) 

# Generate each point of each dimension independent of each other 
for datapoint in range(m):
    z = [np.random.standard_normal() for _ in range(k)] 
    Z[datapoint] = z[:]

Second way:

import numpy as np

# Directly sample the points
Z = np.random.normal(0, 1, (m, k))

What I think is the 2nd way gives a resulting dataset not independent of each other but the 1st one gives i.i.d dataset of points. Is this the difference between the 2 pieces of code?

Upvotes: 0

Views: 937

Answers (1)

tom10
tom10

Reputation: 69182

My assumption would be that standard_normal is just normal with "standard" parameters (mean=0 and std=1).

Let's test that:

import numpy as np

rng0 = np.random.default_rng(43210)
rng1 = np.random.default_rng(43210)

print(rng0.standard_normal(10))
print(rng1.normal(0, 1, 10))

which gives:

[ 0.62824213 -1.18535536 -1.18141382 -0.74127753 -0.41945915 1.02656223 -0.64935657  1.70859865  0.47731614 -1.12700957]
[ 0.62824213 -1.18535536 -1.18141382 -0.74127753 -0.41945915 1.02656223 -0.64935657  1.70859865  0.47731614 -1.12700957]

So I think that assumption was correct.

Upvotes: 1

Related Questions