Reputation: 65
I'm asked to look at how the central limit theory applies with uniformly distributed random numbers. For the first part of the problem I'm asked to created 1,000,000 bins with one number in each bin and then 2, 3, and 10 numbers in each bin.
I've used the NumPy package for creating histograms but trying to create 1,000,000 bins with one number in each bin takes an ungodly amount of time. I was able to create a histogram of 1,000 and 10,000 bins and random numbers though so I think numpy.hist just isn't an efficient method for handling a large number of bins.
Are there other methods for creating histograms with large amounts of data and bins?
EDIT: the random number are in the interval [0,1].
Upvotes: 2
Views: 819
Reputation: 868
You've left details out of your question that could be crucial.
What's your bin size (i.e. do you have 1M bins between [0,1], between [0,20], or between [0,1M])..? What are your performance requirements and what is "slow" for your purposes? Are you hitting memory limits, CPU usage limits or something else?
One trivial solution is to use random.random()
to generate a random number between [0,1], and then use multiplication/addition it to sample in whichever interval you need.
The following code samples 1M bins, of size 1 each, with each bin containing 2 numbers.
import random
hist_data = []
in_each_bin = 2
for i in range(1000000):
for j in range(in_each_bin):
hist_data.append(i+random.random())
print(len(hist_data))
print(hist_data[0:20])
It runs on under 3 seconds on my medium machine.
$ time python3 pytest.py
2000000
[0.9271533001749838, 0.6759096885597532, 1.0950935186564377, 1.4195955772696995, 2.620307487968376, 2.535700184898931, 3.606823695579621, 3.5471311130365346, 4.01255833303964, 4.013715023517034, 5.42988725471679, 5.257435390135351, 6.681956593279519, 6.686189487682324, 7.916591795688389, 7.598478524938438, 8.309152266029844, 8.997231092516385, 9.801082205541228, 9.198095437802664]
real 0m3.418s
user 0m2.547s
sys 0m0.500s
Does that fit your needs and requirements?
Upvotes: 1