Joe Pearson
Joe Pearson

Reputation: 51

How to go through a dataframe and classify text either positive or negative?

I currently have a pandas dataframe which contains tokenized tweets.

I need to be able to go through each tweet and work out if it is positive or negative allowing me to add a subsequent column containing the word either positive or negative.

example data :

tokenized_tweets =  ['football, was, good, we, played, well' , 'We, were, unlucky, today, bad, luck' , 'terrible, performance, bad, game'] 

I need to be able to run a a loop through the tokenized_tweets section figuring out if it is positive or negative.

For the case of the examples, the positive and negative words are as followed:

Positive_words = ['good', 'great'] 
Negative_words = ['terrible, 'bad']

The desired output is a datafame which contains the tweet, how many positive letters each tweet contained, how many negative letters each tweet contained and if the tweet was positive, negative or neutral.

Postive negative and neutral needs to be worked out based upon whether a tweet has more positive or negative buzzwords

Desired output:

Tokenized tweet                    positive words       negative words         overall 
`football, was, good, we, played, well         1                0            positive` 

We, were, unlucky, today, bad, luck            0                1            negative
terrible, performance, bad, game               0                2            negative

Upvotes: 2

Views: 1141

Answers (1)

It_is_Chris
It_is_Chris

Reputation: 14103

import pandas as pd
import numpy as np

df = pd.DataFrame({'tokenized_tweets': ['football, was, good, we, played, well', 'We, were, unlucky, today, bad, luck','terrible, performance, bad, game']})

Positive_words = ['good', 'great'] 
Negative_words = ['terrible','bad']

df['positive words'] = df['tokenized_tweets'].str.count('|'.join(Positive_words))
df['negative words'] = df['tokenized_tweets'].str.count('|'.join(Negative_words))

conditions = [
(df['positive words'] > df['negative words']),
(df['negative words'] > df['positive words']),
(df['negative words'] == df['positive words'])
]

choices = [
'positive',
'negative',
'neutral'
]

df['overall'] = np.select(conditions, choices, default = '')

df

OUT:

tokenized_tweets                      positive words   negative words   overall
0   football, was, good, we, played, well   1               0        positive
1   We, were, unlucky, today, bad, luck     0               1        negative
2   terrible, performance, bad, game        0               2        negative

Upvotes: 2

Related Questions