user308827
user308827

Reputation: 22011

Converting numpy array to categories

I would like to convert a numpy array into 5 classes: very low, low, average, high, very high; based on whether the values are -2 or more std. dev away from the mean of the array (for very low); -1 std. dev or more away from the mean (for low class); between -1 and +1 std. dev from the mean (for average); between +1 and +2 std. dev from mean (for high class) and greater than +2 std. dev. from the mean (for very high class).

I tried using stats.perentileofscore, but that does not give me what I want:

arr = np.random.rand(100)
[stats.percentileofscore(x, a, 'rank') for a in arr]

Upvotes: 1

Views: 2594

Answers (1)

Alexander
Alexander

Reputation: 109666

You can use pd.cut in Pandas.

sd = arr.std()
m = arr.mean()
>>> pd.cut(arr, [m - sd* 10000, m - sd * 2, m - sd, m + sd, m + sd *2, m + sd* 10000])
[(0.204, 0.785], (0.204, 0.785], (0.785, 1.0764], (0.785, 1.0764], (0.204, 0.785], ..., (0.204, 0.785], (0.204, 0.785], (-0.0875, 0.204], (0.204, 0.785], (0.785, 1.0764]]
Length: 100
Categories (5, object): [(-2909.105, -0.0875] < (-0.0875, 0.204] < (0.204, 0.785] < (0.785, 1.0764] < (1.0764, 2910.0944]]

To rename your categories:

buckets = (pd.Categorical(pd.cut(arr, 
               [m - sd * 10000, m - sd * 2, m - sd, m + sd, m + sd * 2, m + sd * 10000]))
           .rename_categories(['very low', 'low', 'average', 'high', 'very high']))

>>> buckets
[average, average, high, high, average, ..., average, average, low, average, high]
Length: 100
Categories (5, object): [very low, low, average, high, very high]

Upvotes: 1

Related Questions