rohith p
rohith p

Reputation: 75

Scale between -1 and 1

I have a data frame with positive,negative and neutral sentiment analysis percentages of a text and I am trying to scale this data into a number that is between -1(most negative) and 1(most positive). What would be the best formula to determine this score?
Dataframe example:
Data columns (total 11 columns):

 #   Column            Non-Null Count  Dtype
---  ------            --------------  -----  
 0   kind              200 non-null    object
 1   etag              200 non-null    object
 2   id                200 non-null    object
 3   positive          200 non-null    float64
 4   negative          200 non-null    float64
 5   neutral           200 non-null    float64

New field called score needs to be added with appropriate formula . Example score: Downloading comments of Video Number : 49
Positive sentiment : 39.37210499227998
Negative sentiment : 18.57951621204323
Neutral sentiment : 42.04837879567679

Upvotes: 1

Views: 1642

Answers (3)

Doracahl
Doracahl

Reputation: 532

One way to scale values is to use sklearn's MinMaxScaler function. Make sure to add a feature_range parameter when calling to define lower and upper bounds of output. Here's a working demo:

from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler(feature_range=(-1,1))
scaler.fit(data)
scaler.transform(data)

Please see below for a working example when using a Pandas dataframe:

from sklearn.preprocessing import MinMaxScaler
data = df[['sentiment']]
scaler = MinMaxScaler(feature_range=(-1,1))
df['scaled'] = [i for s in scaler.fit_transform(data) for i in s]

Upvotes: 0

Ralvi Isufaj
Ralvi Isufaj

Reputation: 482

This can be seen as min-max scaling. To get a value in [-1,1] one would do:

val = (2 *(val - min)/(max-min)) - 1

Nedless to say that val is the current value being normalized, min is the smallest of all values and max the biggest of all values.

Upvotes: 1

malanb5
malanb5

Reputation: 314

I would just set positive sentiment to 1, negative sentiment to -1, and neutral to 0. Then scale each according to their percentages to get a composite score.

So for the example mentioned, the score would be

score = positive% * positive_score + neutral % * neutral_score + negative % * negative_score

score = .3937 * 1 + .4205 * 0 + .1858 * -1
score = .2079

Intuitively this makes sense because if we had all positive scores then we'd have a max score of 1. If we had all negative scores then we would have a min score of -1, and neutral a score of 0.

You can use the iterrows function to iterate through all the rows and then write a function to combine those scores into a new column or update an existing column.

Upvotes: 0

Related Questions