Jonas Palačionis
Jonas Palačionis

Reputation: 4842

Why the means are different when appending items to np.array and a list in Python

I have a code which looks like this:

list_a = []
arr = np.array([])
for _ in range(1,101):
    list_a.append(np.random.randint(1,1001,100).mean())
    arr = np.append(arr, np.random.randint(1,1001,100).mean())

print(f'casted list to np.array mean - {np.array(list_a).mean()}')
print(f'old school average - {sum(list_a)/len(list_a)}')
print(f'just arr.mean - {arr.mean()}')
print(f'first array element - {arr[0]}')
print(f'first list element - {list_a[0]}')
print(f'last arr element - {arr[99]}')
print(f'last list element - {list_a[99]}')

This prints:

casted list to np.array mean - 498.9785
old school average - 498.97850000000005
just arr.mean - 499.5889000000001
first array element - 510.76
first list element - 518.8
last arr element - 527.54
last list element - 521.58

Why do I get means that are not equal and why does the first and the last elements ( I am assuming the rest too ) are not equal when they are inside the same loop? Is there a difference when casting list to np.array and getting the mean vs just appending items to a np.array and getting the mean?

Upvotes: 0

Views: 46

Answers (2)

Supergrover
Supergrover

Reputation: 74

np.random.randint() generates a new number every time you call it so it makes sense the numbers in both lists are different. If you want to add the same number to both lists, store it in a variable first and then append it to both lists:

list_a = []
arr = np.array([])
for _ in range(1,101):
    value = np.random.randint(1,1001,100).mean()
    list_a.append(value)
    arr = np.append(arr, value)

print(f'casted list to np.array mean - {np.array(list_a).mean()}')
print(f'old school average - {sum(list_a)/len(list_a)}')
print(f'just arr.mean - {arr.mean()}')
print(f'first array element - {arr[0]}')
print(f'first list element - {list_a[0]}')
print(f'last arr element - {arr[99]}')
print(f'last list element - {list_a[99]}')

The difference between the first two averages is a difference in representation by the print() function, the values themselves only differ by 0.000000000000005.

Upvotes: 0

Karl Knechtel
Karl Knechtel

Reputation: 61635

Why do I get means that are not equal

Because the data is different, as you found with the other tests.

and why does the first and the last elements ( I am assuming the rest too ) are not equal when they are inside the same loop?

Because each time through the loop, you do np.random.randint(1,1001,100).mean() to determine a value to append to list_a, and then you do it again to determine a value to append to arr. np.random.randint is used to produce random numbers, so of course it produces a different array on each of those two calls; and so the means are different, and so the values stored are different.

Is there a difference when casting list to np.array and getting the mean vs just appending items to a np.array and getting the mean?

There is no such thing in Python as "casting", but no, you get the same value this way. I know, your output shows 498.9785 in one case and 498.97850000000005 in another. These are extremely close. Floating-point work sometimes involves a tiny amount of imprecision.

Upvotes: 2

Related Questions