qwop
qwop

Reputation: 23

Counting frequencies of numeric data in range intervals

I am trying to improve my code which sorts randomly generated numbers into range intervals for the purpose of analyzing the accuracy of the random number generator. Currently my sorting is performed by 20 elif statements (I only have an introductory knowledge of python) and as a result my code takes a long time to execute. How can I more efficiently sort numeric data into intervals and only save the frequency of numbers in the interval?

from datetime import datetime
startTime = datetime.now()
def test_rand(points):
    import random
    d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20 = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    # these variables will be used to count frequency of numbers into 20 intervals: (-10,-9], (-9,-8] ... etc
    g1,g2,g3,g4,g5,g6,g7,g8,g9,g10,g11,g12,g13,g14,g15,g16,g17,g18,g19,g20 = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    # these variables will be used to count frequency of every 20 numbers into 20 intervals: (-200,-180], (-180,-160] ... etc
    y = 0
    n = 0
    for i in range(points):
        x = random.uniform(-10.0,10.0)
        while n < 20:
            y += x
            n += 1
            break
        if n == 20:
            if y < -180:
                g1 += 1
            elif y < -160 and y > -180:
                g2 += 1
            elif y < -140 and y > -160:
                g3 += 1
            elif y < -120 and y > -140:
                g4 += 1
            elif y < -100 and y > -120:
                g5 += 1
            elif y < -80 and y > -100:
                g6 += 1
            elif y < -60 and y > -80:
                g7 += 1
            elif y < -40 and y > -60:
                g8 += 1
            elif y < -20 and y > -40:
                g9 += 1
            elif y < 0 and y > -20:
                g10 += 1
            elif y < 20 and y > 0:
                g11 += 1
            elif y < 40 and y > 20:
                g12 += 1
            elif y < 60 and y > 40:
                g13 += 1
            elif y < 80 and y > 60:
                g14 += 1
            elif y < 100 and y > 80:
                g15 += 1
            elif y < 120 and y > 100:
                g16 += 1
            elif y < 140 and y > 120:
                g17 += 1
            elif y < 160 and y > 140:
                g18 += 1
            elif y < 180 and y > 160:
                g19 += 1
            elif y > 180:
                g20 += 1
            y *= 0
            n *= 0

        if x < -9:
            d1 += 1
        elif x < -8 and x > -9:
            d2 += 1
        elif x < -7 and x > -8:
            d3 += 1
        elif x < -6 and x > -7:
            d4 += 1
        elif x < -5 and x > -6:
            d5 += 1
        elif x < -4 and x > -5:
            d6 += 1
        elif x < -3 and x > -4:
            d7 += 1
        elif x < -2 and x > -3:
            d8 += 1
        elif x < -1 and x > -2:
            d9 += 1
        elif x < 0 and x > -1:
            d10 += 1
        elif x < 1 and x > 0:
            d11 += 1
        elif x < 2 and x > 1:
            d12 += 1
        elif x < 3 and x > 2:
            d13 += 1
        elif x < 4 and x > 3:
            d14 += 1
        elif x < 5 and x > 4:
            d15 += 1
        elif x < 6 and x > 5:
            d16 += 1
        elif x < 7 and x > 6:
            d17 += 1
        elif x < 8 and x > 7:
            d18 += 1
        elif x < 9 and x > 8:
            d19 += 1
        elif x > 9:
            d20 += 1

    return d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20,g1,g2,g3,g4,g5,g6,g7,g8,g9,g10,g11,g12,g13,g14,g15,g16,g17,g18,g19,g20

print(test_rand(100000000))    

print (datetime.now() - startTime)

The code is meant to perform 2 functions with the random numbers. The first is to sort the numbers into 20 intervals (so 5 % of the numbers should be in each interval). The second is to sum every 20 numbers generated and place these into 20 new intervals (a normal curve should be observed)

@tristan I've modified your code for performing the above:

for idx in range(points):
        val_1 = uniform(-10, 10)
        val_20 += val_1
        if (idx + 1) % 20 == 0:
            counter2[bisect(occ2, val_20)] += 1
            counter1[bisect(occ1, val_1)] += 1
            val_20 = 0
            val_1 = 0
        else:
            counter1[bisect(occ1, val_1)] += 1
            val_1 = 0

While this method only saves 6 seconds (1:54 --> 1:48) it is FAR more organized and easier to look at. Thanks for the help!

Upvotes: 2

Views: 1011

Answers (1)

Tristan
Tristan

Reputation: 1576

Assuming that the data can always be assigned to one of your intervals (you could pre check), using bisect.bisect() would be an efficient and compact way:

from bisect import bisect
from random import randint

occ1 = [-9 + 1 * i for i in range(19)]
occ2 = [-180 + 20 * i for i in range(19)]
data = [randint(-10, 10) for _ in range(100)]
counter1, counter2 = {i: 0 for i in range(20)}, {i: 0 for i in range(20)}

for idx, element in enumerate(data):
    if (idx + 1) % 20 == 0:
        counter2[bisect(occ2, element)] += 1
    else:
        counter1[bisect(occ1, element)] += 1

The bisect() function returns the position where the element should be inserted into an ordered array like occ to maintain the order. With 19 values in occ, there are 20 different positions where a value could be inserted. That is, before the first, between any of the elements or after the last. This corresponds to your 20 intervals. The only thing to mind is, e. g. an element is smaller or larger than the upper or lower bound of your intervals it will still be assigned to the lowest or highest interval. Generating random numbers which respect the interval bounds would prevent that though.

From your question I am not sure if you want to cumulate some random numbers or just check the list of points, where every 20 values a different check is performed. The solution could be easily adapted to cumulate random numbers until 20 iterations are reached:

from bisect import bisect
from random import uniform

points, value = 100000000, 0
occ1 = [-9 + 1 * i for i in range(19)]
occ2 = [-180 + 20 * i for i in range(19)]
counter1, counter2 = {i: 0 for i in range(20)}, {i: 0 for i in range(20)}

for idx in range(points):
    value += uniform(-10, 10)
    if (idx + 1) % 20 == 0:
        counter2[bisect(occ2, value)] += 1
        value = 0
    else:
        counter1[bisect(occ1, value)] += 1

This runs in 100 seconds for 100M points on my machine.

Upvotes: 2

Related Questions