Reputation: 55
I have a list with multiple arrays and I want to take the average of all the data points together. How can I do this? I have tried the following code below which gives me the mean for each array in the list instead.
cpw = [array([4.2, 4.1, 4.3, 4.3, 4.2, 4.3, 4.1, 4.2, 4.2, 4.1, 4.2, 4.2, 4. ,
4.1, 4.3, 4.1, 4.1, 4.4, 4.1, 4.2, 4.3, 4.1, 4.1, 4.1, 4.2, 4.2,
4.4, 4.2]), array([4.1, 4. , 4. , 4. , 4. , 4. , 4.1, 4. , 4. , 4.1, 4. , 4. , 4.1,
4. , 4. , 4.1, 4. , 4. , 4.1, 4. , 4. , 4. , 4. , 4. , 4. , 4. ,
4. , 4. ]), array([3.9, 3.8, 3.8, 3.9, 3.8, 3.8, 3.9, 3.8, 3.9, 3.9, 3.8, 3.9, 3.9,
3.8, 3.9, 6.3, 3.8, 3.9, 3.9, 3.8, 3.9, 3.9, 3.9, 3.9, 3.8, 3.9,
3.8, 3.9, 3.9, 3.9, 3.9]), array([3.7, 3.7, 3.8, 3.7, 3.7, 3.8, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7]), array([5.1, 4.6, 4.5, 4.6, 4.6, 5.4, 4.6, 4.6, 4.5, 4.6, 4.6, 4.5, 4.6,
4.6, 4.5, 4.6, 4.6, 4.5, 4.6, 2.6, 4.5, 5.4, 3.7, 4.5, 5.4, 1.7,
4.5, 1.4, 4.7, 4.9, 5. , 7.3]),
for i in range(len(cpw)):
output.append(np.mean(cpw[i]))
print(output)
[4.189285714285715, 4.021428571428571, 3.941935483870968, 3.7083333333333335, 4.49375, 4.2285714285714295, 4.400000000000001, 4.189285714285715, 4.021428571428571, 3.941935483870968, 3.7083333333333335,...
I am thinking a zip function is needed to unpack but I am unsure.
Upvotes: 2
Views: 1115
Reputation: 14997
If your data is not too large, you can just concatenate the arrays and then calculate the mean. However, this will create copies of the full dataset, which might need a lot of memory if your data is larger.
In that case, calculate mean and length of each dataset and then do the weighted average of those.
counts = [len(values) for values in cpw]
means = [values.mean() for values in cpw]
mean = np.average(means, weights=counts)
or with concatenation:
np.concatenate(cpw).mean()
Upvotes: 2