Reputation: 429
I am trying to remove outliers from a list in python. I want to get the index values of each outlier from an original list so I can remove it from (another) corresponding list.
~~Simple example~~
my list with outliers:
y = [1,2,3,4,500] #500 is the outlier; has a index of 4
my corresponding list:
x= [1,2,3,4,5] #I want to remove 5, has the same index of 4
MY RESULT/GOAL:
y=[1,2,3,4]
x=[1,2,3,4]
This is my code, and I want to achieve the same with klist and avglatlist
import numpy as np
klist=['1','2','3','4','5','6','7','8','4000']
avglatlist=['1','2','3','4','5','6','7','8','9']
klist = np.array(klist).astype(np.float)
klist=klist[(abs(klist - np.mean(klist))) < (2 * np.std(klist))]
indices=[]
for k in klist:
if (k-np.mean(klist))>((2*np.std(klist))):
i=klist.index(k)
indices.append(i)
print('indices'+str(indices))
avglatlist = np.array(avglatlist).astype(np.float)
for index in sorted(indices, reverse=True):
del avglatlist[index]
print(len(klist))
print(len(avglatlist))
Upvotes: 1
Views: 2250
Reputation: 18913
How to get the index values of each outlier in a list?
Say an outlier is defined as 2 standard deviations from a mean. This means you'd want to know the indices of values in a list where zscores have absolute values greater than 2.
I would use np.where
:
import numpy as np
from scipy.stats import zscore
klist = np.array([1, 2, 3, 4, 5, 6, 7, 8, 4000])
avglatlist = np.arange(1, klist.shape[0] + 1)
indices = np.where(np.absolute(zscore(klist)) > 2)[0]
indices_filter = [i for i,n in enumerate(klist) if i not in indices]
print(avglatlist[indices_filter])
If you don't actually need to know the indices, use a boolean mask instead:
import numpy as np
from scipy.stats import zscore
klist = np.array([1, 2, 3, 4, 5, 6, 7, 8, 4000])
avglatlist = np.arange(1, klist.shape[0] + 1)
mask = np.absolute(zscore(klist)) > 2
print(avglatlist[~mask])
Both solutions print:
[1 2 3 4 5 6 7 8]
Upvotes: 1
Reputation: 9968
You are really close. All you need to do is apply the same filtering regime to a numpy version of avglatlist
. I've changed a few variable names for clarity.
import numpy as np
klist = ['1', '2', '3', '4', '5', '6', '7', '8', '4000']
avglatlist = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
klist_np = np.array(klist).astype(np.float)
avglatlist_np = np.array(avglatlist).astype(np.float)
klist_filtered = klist_np[(abs(klist_np - np.mean(klist_np))) < (2 * np.std(klist_np))]
avglatlist_filtered = avglatlist_np[(abs(klist_np - np.mean(klist_np))) < (2 * np.std(klist_np))]
Upvotes: 0