Reputation: 147
I was trying to apply a function to a dataframe in pandas. I am trying to take two columns as positional arguments and map a function to it. Below is the code I tried. Code:
df_a=pd.read_csv('5_a.csv')
def y_pred(x):
if x<.5:
return 0
else:
return 1
df_a['y_pred']=df_a['proba'].map(y_pred)
def confusion_matrix(act,pred):
if act==1 and act==pred:
return 'TP'
elif act==0 and act==pred:
return 'TN'
elif act==0 and pred==1:
return 'FN'
elif act==1 and pred==0:
return 'FP'
df_a['con_mat_label']=df_a[['y','y_pred']].apply(confusion_matrix)
But the function is not considering y_pred
as the second column and mapping it to pred variable in the defined function.
I am gettting this error:
TypeError: ("confusion_matrix() missing 1 required positional argument: 'pred'", 'occurred at index y')
Upvotes: 0
Views: 90
Reputation: 5757
What you get as argument in the function that you pass as part of apply
method is a pandas series
and using the axis
argument you can specify if has to be a row
or a column
.
So you need to modify your confusion_matrix
function to
act
corresponds to the column name y
here*def confusion_matrix(row):
if row.y==1 and row.y==row.y_pred:
return 'TP'
elif row.y==0 and row.y==row.y_pred:
return 'TN'
elif row.y==0 and row.y_pred==1:
return 'FN'
elif row.y==1 and row.y_pred==0:
return 'FP'
And you need to modify your apply
call to
df_a['con_mat_label']=df_a[['y','y_pred']].apply(confusion_matrix, axis=1)
Now let me give you some tips on how you could improve your code.
Say you have a data frame like this:
>>> df
X Y
0 1 4
1 2 5
2 3 6
3 4 7
To add a Y_pred
column
>>> df['Y_pred'] = (df.X < 3).astype(int)
>>> df
X Y Y_pred
0 1 4 1
1 2 5 1
2 3 6 0
3 4 7 0
Oh btw, I would like you to refer you to this interesting blog post
Upvotes: 1
Reputation: 731
The apply function take each column one by one, run it through the function and return an transformed column. Here are more documentation on it pandas documentation. Your setup would be better for a list comprehension. Here how you can get the intended behavior:
df_a['con_mat_label'] = [confusion_matrix(act,pred) for (act,pred) in df[['y','y_pred']].to_numpy()]
Hope it helps!
Upvotes: 1