Reputation: 75
I have a data frame with positive,negative and neutral sentiment analysis percentages of a text and I am trying to scale this data into a number that is between -1(most negative) and 1(most positive). What would be the best formula to determine this score?
Dataframe example:
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 kind 200 non-null object
1 etag 200 non-null object
2 id 200 non-null object
3 positive 200 non-null float64
4 negative 200 non-null float64
5 neutral 200 non-null float64
New field called score needs to be added with appropriate formula .
Example score:
Downloading comments of Video Number : 49
Positive sentiment : 39.37210499227998
Negative sentiment : 18.57951621204323
Neutral sentiment : 42.04837879567679
Upvotes: 1
Views: 1642
Reputation: 532
One way to scale values is to use sklearn's MinMaxScaler function. Make sure to add a feature_range parameter when calling to define lower and upper bounds of output. Here's a working demo:
from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler(feature_range=(-1,1))
scaler.fit(data)
scaler.transform(data)
Please see below for a working example when using a Pandas dataframe:
from sklearn.preprocessing import MinMaxScaler
data = df[['sentiment']]
scaler = MinMaxScaler(feature_range=(-1,1))
df['scaled'] = [i for s in scaler.fit_transform(data) for i in s]
Upvotes: 0
Reputation: 482
This can be seen as min-max scaling. To get a value in [-1,1] one would do:
val = (2 *(val - min)/(max-min)) - 1
Nedless to say that val is the current value being normalized, min is the smallest of all values and max the biggest of all values.
Upvotes: 1
Reputation: 314
I would just set positive sentiment to 1, negative sentiment to -1, and neutral to 0. Then scale each according to their percentages to get a composite score.
So for the example mentioned, the score would be
score = positive% * positive_score + neutral % * neutral_score + negative % * negative_score
score = .3937 * 1 + .4205 * 0 + .1858 * -1
score = .2079
Intuitively this makes sense because if we had all positive scores then we'd have a max score of 1. If we had all negative scores then we would have a min score of -1, and neutral a score of 0.
You can use the iterrows function to iterate through all the rows and then write a function to combine those scores into a new column or update an existing column.
Upvotes: 0