Reputation: 87
This is the very first question I ask here, so I hope I'll be clear enough :)
So I'm trying to write an outlier function , which takes 3 arguments:
-df: a Pandas dataframe
-L: a list containing some of this dataframe's columns
-threshold: a threshold we can choose, knowing that I'm using the z_score method in this function.
Here is the function I'm trying to implement:
def out1(df,L,threshold):
liste=[]
for i in L:
dico={}
try:
dico['Column Name']=i
dico['Number of
outliers']=len(np.where(np.abs(stats.zscore(df[L[i]])>threshold))[0])
dico['Top 10 outliers']='a' #I'll fill this later
dico['Exception']=None
except Exception as e:
dico['Exception']=str(e)
liste.append(dico)
return(liste)
I have to use an exception here because not all the columns of df are necessarily numerical (so L can contain columns names that are not numerical) and thus it would be non-sense to use the z_score method and look for outliers in those columns.
However, I tried to run this code with:
-df: a simple dataframe I have
-L=['Terminations'] (a numerical column of my dataframe df)
-threshold=2
And this is what Python2.7 returns:
Out[8]:
[{'Column Name': 'Terminations',
'Exception': 'list indices must be integers, not str'}]
Although I'm not even sure if this has something to do with the Try...Except, I could really use any help to solve my problem !
Thank you in advance,
Alex
EDIT: I haven't really made clear what I was expecting as an output.
Let's say the argument L only contains 1 element:
So L=['One column name of df']
Either this column is numerical (so I want to apply the z_score method), either it is not (so I want to raise an exception).
If this column is numerical, the output would be:
[{'Column Name': 'One column name of df'; 'Number of outliers': xxx; 'Top 10 outliers': [I'll make it a liste later]; 'Exception': None}]
If the column is not numerical, it would be:
[{'Column Name': 'One column name of df'; 'Number of outliers': None; 'Top 10 outliers: None, 'Exception': 'The column you chose is not numerical}]
Upvotes: 1
Views: 83
Reputation: 198294
for i in L:
will generate column names (strings) into i
(not indices!). Later you have L[i]
, which is redundant and wrong, and the cause for the "list indices must be integers, not str" exception.
As a teachable moment, it is a good time to suggest better variable naming - if you wrote for column_name in column_names:
instead, it would likely not have occured to you to write column_names[column_name]
. :)
Upvotes: 1