Reputation: 3459
Let's say I have a normal distribution, and I want to sample from it 1000 times and if the value is in the top 20th percentile, I want to add 1 to a counter. What's the optimal solution for this?
I'm trying to solve this using numpy and can't figure out the math behind it. Right now I have this but I feel like it could be done mathematically:
import numpy as np
s = np.random.normal(0, 1, 1000)
sum([val for val in s if val > np.percentile(s, 80)])
Upvotes: 1
Views: 97
Reputation:
This is the generated array (you can play with mean and standard deviation, they are the default ones):
mu = 0
std = 1
arr = np.random.normal(mu, std, 1000)
This gives you the number of items in the top 20 percentile:
arr[arr > np.percentile(arr, 80)].size
Out[30]: 200
Edit: Your code is also nice. But you don't want to sum the values, you want to count them. So whenever val > np.percentile(s, 80)
you want to sum 1's:
sum([1 for val in s if val > np.percentile(s, 80)])
Out[35]: 200
This will be slower than numpy's methods though.
Upvotes: 1
Reputation: 1667
If you have a normal distribution, I'll assume you know mu
and sigma
, otherwise this question requires some extra post-processing. You'll have a certain Z=(X-mu)/sigma
for each element you take out. Your 20th percentile would be any X
that makes Z>0.842
.
You can do something like:
if Z_val(x, mu, sigma) > 0.842: counter+=1
Upvotes: 1