Reputation: 433
I am working on a sentiment analysis project and for that I am trying to see my model performance from different angles.
Below is an example table:
|Topic|value|label|
|A |-0.99|0 |
|B |-0.98|1 |
|c |-0.93|0 |
|d |0.8 |1 |
|e |0.91 |1 |
|f |0.97 |0 |
My goal is to calculate the euclidean distance of points between column: value and label and have them in a column in the dataframe
Here, if the data in value column is towards +ve and label is 1, its desirable and if the value is towards -ve and label is 0 it is desirable. The vice-versa is undesirable.
I want to study the deviations, so I want to plot the distance and value column.
Being new to ML and python, I came across some internet results like:
from numpy.linalg import norm
df['distance'] = norm(df['value'] - df['label'])
df.plot(x='compound', y='distance')
But I obtain a straight line as all points in distance table have the same value.
I want the distance for each individual pairs so I am also trying:
import math
df['Score'] = math.sqrt((df['value'] - df['label'])**2)
df.head(2)
But this yields error. Can anyone please help me on this.
Upvotes: 1
Views: 2917
Reputation: 46888
What you found was for calculating euclidean distance between 2 vectors, so you get a single value because it is treating value
as 1 data point and label
as a data point, and giving you the euclidean distance between these two points.
Since you have only 1 coordinate, what you need is the absolute distance:
df = pd.DataFrame({'Topic':['A','B','c','d','e','f'],'value':[-0.99,-0.98,-0.93,0.8,0.91,0.97],
'label':[0,1,0,1,1,0]})
df['distance'] = abs(df['value']-df['label'])
df.plot.scatter(x='value', y='distance')
Upvotes: 1