Reputation: 11
I am trying to write a program which reads a text file and then sorts it out into whether the comments in it are positive, negative or neutral. I have tried all sorts of ways to do this but each time with no avail. I can search for 1 word with no problems but any more than that and it doesn't work. Also, I have an if statement but I've had to use else twice underneath it as it wouldn't allow me to use elif. Any help with where I'm going wrong would be really appreciated. Thanks in advance.
middle = open("middle_test.txt", "r")
positive = []
negative = [] #the empty lists
neutral = []
pos_words = ["GOOD", "GREAT", "LOVE", "AWESOME"] #the lists I'd like to search
neg_words = ["BAD", "HATE", "SUCKS", "CRAP"]
for tweet in middle:
words = tweet.split()
if pos_words in words: #doesn't work
positive.append(words)
else: #can't use elif for some reason
if 'BAD' in words: #works but is only 1 word not list
negative.append(words)
else:
neutral.append(words)
Upvotes: 0
Views: 735
Reputation: 2166
be careful, open() returns a file object.
>>> f = open('workfile', 'w')
>>> print f
<open file 'workfile', mode 'w' at 80a0960>
Use this:
>>> f.readline()
'This is the first line of the file.\n'
Then use set intersection:
positive += list(set(pos_words) & set(tweet.split()))
Upvotes: 0
Reputation: 122260
Use a Counter
, see http://docs.python.org/2/library/collections.html#collections.Counter:
import urllib2
from collections import Counter
from string import punctuation
# data from http://inclass.kaggle.com/c/si650winter11/data
target_url = "http://goo.gl/oMufKm"
data = urllib2.urlopen(target_url).read()
word_freq = Counter([i.lower().strip(punctuation) for i in data.split()])
pos_words = ["good", "great", "love", "awesome"]
neg_words = ["bad", "hate", "sucks", "crap"]
for i in pos_words:
try:
print i, word_freq[i]
except: # if word not in data
pass
[out]:
good 638
great 1082
love 7716
awesome 2032
Upvotes: 1
Reputation: 54342
You have some problems. At first you can create functions that read comments from file and divides comments into words. Make them and check if they work as you want. Then main procedure can look like:
for comment in get_comments(file_name):
words = get_words(comment)
classified = False
# at first look for negative comment
for neg_word in NEGATIVE_WORDS:
if neg_word in words:
classified = True
negatives.append(comment)
break
# now look for positive
if not classified:
for pos_word in POSITIVE_WORDS:
if pos_word in words:
classified = True
positives.append(comment)
break
if not classified:
neutral.append(comment)
Upvotes: 0
Reputation: 2214
You are not reading the lines from the file. And this line
if pos_words in words:
I think it is checking for the list ["GOOD", "GREAT", "LOVE", "AWESOME"] in words. That is you are looking in the list of words for a list pos_words = ["GOOD", "GREAT", "LOVE", "AWESOME"].
Upvotes: 0
Reputation: 1394
You could use the code below to count the number of positive and negative words in a paragraph:
from collections import Counter
def readwords( filename ):
f = open(filename)
words = [ line.rstrip() for line in f.readlines()]
return words
# >cat positive.txt
# good
# awesome
# >cat negative.txt
# bad
# ugly
positive = readwords('positive.txt')
negative = readwords('negative.txt')
print positive
print negative
paragraph = 'this is really bad and in fact awesome. really awesome.'
count = Counter(paragraph.split())
pos = 0
neg = 0
for key, val in count.iteritems():
key = key.rstrip('.,?!\n') # removing possible punctuation signs
if key in positive:
pos += val
if key in negative:
neg += val
print pos, neg
Upvotes: 0