Reputation: 9
I have an array [1,2,3,4,5,6,10,100,200] What I want is to remove the 2 largest numbers outliers in the array. The result should be [1, 2, 3, 4, 5, 6, 10].
I tried this but its not working. Anyone can help me please?
arr = [1,2,3,4,5,6,10,100,200]
elements = numpy.array(arr)
mean = numpy.mean(elements, axis=0)
sd = numpy.std(elements, axis=0)
final_list = [x for x in arr if (x > mean - 2 * sd)]
final_list = [x for x in final_list if (x < mean + 2 * sd)]
print(final_list)
Upvotes: -2
Views: 66
Reputation: 262224
If you want to remove all items greater or equal to the second largest, use partition
and boolean indexing:
elements = np.array([1,2,3,4,5,6,10,100,200])
N = 2
out = elements[elements < np.partition(elements, -N)[-N]]
If you only want to remove the largest two, even if there can be a tie and more than 2 items that are above the threshold rather use argsort
+argpartition
:
N = 2
out = elements[np.argsort(np.argpartition(elements, -N))<elements.shape[0]-N]
# variant
# out = elements[np.argsort(np.argpartition(-elements, N))>=N]
Output:
array([ 1, 2, 3, 4, 5, 6, 10])
# elements
array([ 1, 2, 3, 4, 5, 6, 100, 100, 200, 10])
# elements[elements < np.partition(elements, -N)[-N]]
array([ 1, 2, 3, 4, 5, 6, 10])
# elements[np.argsort(np.argpartition(elements, -N))<elements.shape[0]-N]
array([ 1, 2, 3, 4, 5, 6, 100, 10])
Upvotes: 1