How to calculate the distance between two columns and plot them pandas

Question

I am working on a sentiment analysis project and for that I am trying to see my model performance from different angles.

Below is an example table:

|Topic|value|label|
|A    |-0.99|0    |
|B    |-0.98|1    |
|c    |-0.93|0    |
|d    |0.8  |1    |
|e    |0.91 |1    |
|f    |0.97 |0    |

My goal is to calculate the euclidean distance of points between column: value and label and have them in a column in the dataframe

Here, if the data in value column is towards +ve and label is 1, its desirable and if the value is towards -ve and label is 0 it is desirable. The vice-versa is undesirable.

I want to study the deviations, so I want to plot the distance and value column.

Being new to ML and python, I came across some internet results like:

from numpy.linalg import norm

df['distance'] = norm(df['value'] - df['label'])

df.plot(x='compound', y='distance')

But I obtain a straight line as all points in distance table have the same value.

I want the distance for each individual pairs so I am also trying:

import math
df['Score'] = math.sqrt((df['value'] - df['label'])**2)
df.head(2)

But this yields error. Can anyone please help me on this.

StupidWolf · Accepted Answer

What you found was for calculating euclidean distance between 2 vectors, so you get a single value because it is treating value as 1 data point and label as a data point, and giving you the euclidean distance between these two points.

Since you have only 1 coordinate, what you need is the absolute distance:

df = pd.DataFrame({'Topic':['A','B','c','d','e','f'],'value':[-0.99,-0.98,-0.93,0.8,0.91,0.97],
                   'label':[0,1,0,1,1,0]})

df['distance'] = abs(df['value']-df['label'])

df.plot.scatter(x='value', y='distance')

How to calculate the distance between two columns and plot them pandas

Answers (1)

Related Questions