Mapping a function to a dataframe

I was trying to apply a function to a dataframe in pandas. I am trying to take two columns as positional arguments and map a function to it. Below is the code I tried. Code:

df_a=pd.read_csv('5_a.csv')
def y_pred(x):
    if x<.5:
        return 0
    else:
        return 1
df_a['y_pred']=df_a['proba'].map(y_pred)
def confusion_matrix(act,pred):
    if act==1 and act==pred:
        return 'TP'
    elif act==0 and act==pred:
        return 'TN'
    elif act==0 and pred==1:
        return 'FN'
    elif act==1 and pred==0:
        return 'FP'
df_a['con_mat_label']=df_a[['y','y_pred']].apply(confusion_matrix)

But the function is not considering y_pred as the second column and mapping it to pred variable in the defined function. I am gettting this error: TypeError: ("confusion_matrix() missing 1 required positional argument: 'pred'", 'occurred at index y')

Upvotes: 0

Views: 90

Answers (2)

abhilb
abhilb

Reputation: 5757

What you get as argument in the function that you pass as part of apply method is a pandas series and using the axis argument you can specify if has to be a row or a column.

So you need to modify your confusion_matrix function to

  • I am assuming that the act corresponds to the column name y here*
def confusion_matrix(row):
    if row.y==1 and row.y==row.y_pred:
        return 'TP'
    elif row.y==0 and row.y==row.y_pred:
        return 'TN'
    elif row.y==0 and row.y_pred==1:
        return 'FN'
    elif row.y==1 and row.y_pred==0:
        return 'FP'

And you need to modify your apply call to

df_a['con_mat_label']=df_a[['y','y_pred']].apply(confusion_matrix, axis=1)


Now let me give you some tips on how you could improve your code.

Say you have a data frame like this:

>>> df
   X  Y
0  1  4
1  2  5
2  3  6
3  4  7

To add a Y_pred column

>>> df['Y_pred'] = (df.X < 3).astype(int)
>>> df
   X  Y  Y_pred
0  1  4       1
1  2  5       1
2  3  6       0
3  4  7       0

Oh btw, I would like you to refer you to this interesting blog post

Upvotes: 1

Yacine Mahdid
Yacine Mahdid

Reputation: 731

The apply function take each column one by one, run it through the function and return an transformed column. Here are more documentation on it pandas documentation. Your setup would be better for a list comprehension. Here how you can get the intended behavior:

df_a['con_mat_label'] = [confusion_matrix(act,pred) for (act,pred) in df[['y','y_pred']].to_numpy()]

Hope it helps!

Upvotes: 1

Related Questions