Reputation: 151
I want to generate a dataset with m
random data points of k
dimensions each. Thus resulting in data size of shape (m, k)
. These points should be i.i.d. from a normal distribution with mean 0 and standard deviation 1. There are 2 ways of generating these points.
First way:
import numpy as np
# Initialize the array
Z = np.zeros((m, k))
# Generate each point of each dimension independent of each other
for datapoint in range(m):
z = [np.random.standard_normal() for _ in range(k)]
Z[datapoint] = z[:]
Second way:
import numpy as np
# Directly sample the points
Z = np.random.normal(0, 1, (m, k))
What I think is the 2nd way gives a resulting dataset not independent of each other but the 1st one gives i.i.d dataset of points. Is this the difference between the 2 pieces of code?
Upvotes: 0
Views: 937
Reputation: 69182
My assumption would be that standard_normal
is just normal
with "standard" parameters (mean=0 and std=1).
Let's test that:
import numpy as np
rng0 = np.random.default_rng(43210)
rng1 = np.random.default_rng(43210)
print(rng0.standard_normal(10))
print(rng1.normal(0, 1, 10))
which gives:
[ 0.62824213 -1.18535536 -1.18141382 -0.74127753 -0.41945915 1.02656223 -0.64935657 1.70859865 0.47731614 -1.12700957]
[ 0.62824213 -1.18535536 -1.18141382 -0.74127753 -0.41945915 1.02656223 -0.64935657 1.70859865 0.47731614 -1.12700957]
So I think that assumption was correct.
Upvotes: 1