Reputation: 497
I have a list that I'm adding to a pandas data frame it contains a range of decimal values. I want to divide it into 3 ranges each range represents one value
sents=[]
for sent in sentis:
if sent > 0:
if sent < 0.40:
sents.append('negative')
if (sent >= 0.40 and sent <= 0.60):
sents.append('neutral')
if sent > 0.60
sents.append('positive')
my question is if there is a more efficient way in pandas to do this as i'm trying to implement this on a bigger list and
Thanks in advance.
Upvotes: 2
Views: 887
Reputation: 294488
You can use pd.cut
to produce the results that are of type categorical
and have the appropriate labels.
In order to fix the inclusion of .4
and .6
for the neutral
category, I add and subtract the smallest float epsilon
sentis = np.linspace(0, 1, 11)
eps = np.finfo(float).eps
pd.DataFrame(dict(
Value=sentis,
Sentiment=pd.cut(
sentis, [-np.inf, .4 - eps, .6 + eps, np.inf],
labels=['negative', 'neutral', 'positive']
),
))
Sentiment Value
0 negative 0.0
1 negative 0.1
2 negative 0.2
3 negative 0.3
4 neutral 0.4
5 neutral 0.5
6 neutral 0.6
7 positive 0.7
8 positive 0.8
9 positive 0.9
10 positive 1.0
Upvotes: 2
Reputation: 49893
List comprehension:
['negative' if x < 0.4 else 'positive' if x > 0.6 else 'neutral' for x in sentis]
Upvotes: 0