Reputation: 546
I'm currently facing the following problem: During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.
To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.
My implementation for this is:
import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels
mean = sum((channels+1)/2 * counts)/len(counts)
plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()
The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well
mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)
But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?
As mentioned above, here is the something similar to the data set I'm using:
counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])
Upvotes: 1
Views: 90
Reputation: 445
My assumptions:
Suppose you have a nice dice with nine sides, each side has a number.
numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]
Each number has the same probability: 1/9
. You may ask, what is the expected value of the dice? Well, with Python it is easy.
prob_li = []
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)
print(sum(prob_li)) # 15
If the probability of each side changes, say something like
probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]
the expected value is
prob_li = []
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)
print(sum(prob_li)) # 16.13
Now suppose that you construct a matrix, and each column has a probability probs[i]
np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))
mat
is a matrix with a shape (3,9). I would find the expected value as
result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46
For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.
Upvotes: 1