babe_engineer
babe_engineer

Reputation: 9

Remove the Largest Outlier in the array-Python

I have an array [1,2,3,4,5,6,10,100,200] What I want is to remove the 2 largest numbers outliers in the array. The result should be [1, 2, 3, 4, 5, 6, 10].

I tried this but its not working. Anyone can help me please?

arr = [1,2,3,4,5,6,10,100,200]

elements = numpy.array(arr)

mean = numpy.mean(elements, axis=0)
sd = numpy.std(elements, axis=0)

final_list = [x for x in arr if (x > mean - 2 * sd)]
final_list = [x for x in final_list if (x < mean + 2 * sd)]
print(final_list)

Upvotes: -2

Views: 66

Answers (1)

mozway
mozway

Reputation: 262224

If you want to remove all items greater or equal to the second largest, use partition and boolean indexing:

elements = np.array([1,2,3,4,5,6,10,100,200])

N = 2
out = elements[elements < np.partition(elements, -N)[-N]]

If you only want to remove the largest two, even if there can be a tie and more than 2 items that are above the threshold rather use argsort+argpartition:

N = 2
out = elements[np.argsort(np.argpartition(elements, -N))<elements.shape[0]-N]
# variant
# out = elements[np.argsort(np.argpartition(-elements, N))>=N]

Output:

array([ 1,  2,  3,  4,  5,  6, 10])
difference of behavior
# elements
array([  1,   2,   3,   4,   5,   6, 100, 100, 200,  10])

# elements[elements < np.partition(elements, -N)[-N]]
array([ 1,  2,  3,  4,  5,  6, 10])

# elements[np.argsort(np.argpartition(elements, -N))<elements.shape[0]-N]
array([  1,   2,   3,   4,   5,   6, 100,  10])

Upvotes: 1

Related Questions