Henk Straten
Henk Straten

Reputation: 1447

Determine a positive / negative word ratio in a tweet set

I have a set of tweets in which I would like to determine a ratio of negative- and positive words ratio.I have the following simplified dictionary's:

negative_words = ['bad', 'terrible']
positive_words = ['outstanding', 'good']

I wrote the following code to analyze them:

tweets = ["this is terrible", "this is very good"]

for tweet in tweets:
 count_positive = 0
 count_negative = 0

 if(tweet in positive_words):
  count_positive = count_positive + 1
 if(tweet in negative_words):
  count_negative = count_negative + 1

 ratio_positive = count_positive / len(tweet)
 ratio_negative = count_negative / len(tweet)
 ratio_negative = float(ratio_negative)
 ratio_positive = float(ratio_positive)

 print(ratio_positive)
 print(ratio_negative)

The output of this code should be a ratio of positive vs negative words. However I only get 0.0... while I expect 0.33 etc...

Any thoughts on what goes wrong here?

Upvotes: 2

Views: 1152

Answers (3)

Schmuddi
Schmuddi

Reputation: 2086

There are a few issues with your code.

(1) As pointed out in Ivaylo's answer, you need to split the tweet into words. You can do this by tweet.split().

(2) You need to determine the length of the tweet in words, not in characters: len(tweet) for the first tweet gives you 16 because there are 16 characters in this is terrible, but there are 3 words.

(3) In Python 2.x (but not in Python 3.x), an expression like i / j is an integer division for as long as all involved variables are integers, which is the case with your count_positive and count_negative variables as well as your len(tweet). You have to make sure that this is a float division.

Here's a revision of your code that fixes these issues.

# You can use the following line to make Python 2.7 behave like Python 3.x
# with regard to divisions: if you import 'division' from the __future__ 
# module, divisions that use the '/' operator will be float, and divisions
# that use the '//'  will be integer. 
from __future__ import division 

negative_words = ['bad', 'terrible']
positive_words = ['outstanding', 'good']

tweets = ["this is terrible", "this is very good"]

for tweet in tweets:
    # split the tweet into words:
    words = tweet.split() 

    # use list comprehensions to create lists of positive and negative 
    # words in the current tweet, and use 'len()' to get the counts in 
    # each list:
    count_positive = len([w for w in words if w in positive_words])
    count_negative = len([w for w in words if w in negative_words])

    # divide the counts by the number of words:
    ratio_positive = count_positive / len(words)
    ratio_negative = count_negative / len(words)

    print(ratio_positive)
    print(ratio_negative)

Edit: Note that an earlier version used the Counter class from the collections module. This is, in general, a very useful class, but it was overkill in the present case (and didn't quite work yet).

Upvotes: 3

Ivaylo Strandjev
Ivaylo Strandjev

Reputation: 70939

I think what you really want to do is to check if each word in the tweet is positive or negative, while currently you are checking if the whole tweet is in the positive/negative word set. Thus you never find it and both numbers stay at 0.

Instead split the tweet and iterate over its words:

for word in tweet.split():
  if word in positive_words:
    count_positive = count_positive + 1

And similarly for the negative words.

EDIT: (contributed in Schmuddi's answer) also note that in order to compute the correct ratio, instead of dividing by len(tweet), which will give you the number of characters in tweet, you need to divide by the number of words in tweet(i.e len(tweet.split())).

Upvotes: 3

CaptainTrunky
CaptainTrunky

Reputation: 1707

I assume that you are using Python 2, because it will perform integer division. You should use float () function to avoid it:

>>> 5 / 2
2
>>> float (5) / 2
2.5

Upvotes: 1

Related Questions