robblockwood
robblockwood

Reputation: 11

Searching from a list of word to words in a text file

I am trying to write a program which reads a text file and then sorts it out into whether the comments in it are positive, negative or neutral. I have tried all sorts of ways to do this but each time with no avail. I can search for 1 word with no problems but any more than that and it doesn't work. Also, I have an if statement but I've had to use else twice underneath it as it wouldn't allow me to use elif. Any help with where I'm going wrong would be really appreciated. Thanks in advance.

middle = open("middle_test.txt", "r")
positive = []
negative = []                                        #the empty lists
neutral = []

pos_words = ["GOOD", "GREAT", "LOVE", "AWESOME"]    #the lists I'd like to search
neg_words = ["BAD", "HATE", "SUCKS", "CRAP"]

for tweet in middle:
    words = tweet.split()
    if pos_words in words:                           #doesn't work
        positive.append(words)        
    else:                                            #can't use elif for some reason
        if 'BAD' in words:                           #works but is only 1 word not list
            negative.append(words)
        else:
            neutral.append(words)

Upvotes: 0

Views: 735

Answers (5)

Mihai Zamfir
Mihai Zamfir

Reputation: 2166

be careful, open() returns a file object.

>>> f = open('workfile', 'w')
>>> print f
<open file 'workfile', mode 'w' at 80a0960>

Use this:

>>> f.readline()
'This is the first line of the file.\n'

Then use set intersection:

positive += list(set(pos_words) & set(tweet.split())) 

Upvotes: 0

alvas
alvas

Reputation: 122260

Use a Counter, see http://docs.python.org/2/library/collections.html#collections.Counter:

import urllib2
from collections import Counter
from string import punctuation

# data from http://inclass.kaggle.com/c/si650winter11/data
target_url = "http://goo.gl/oMufKm" 
data = urllib2.urlopen(target_url).read()

word_freq = Counter([i.lower().strip(punctuation) for i in data.split()])

pos_words = ["good", "great", "love", "awesome"]
neg_words = ["bad", "hate", "sucks", "crap"]

for i in pos_words:
    try:
        print i, word_freq[i]
    except: # if word not in data
        pass

[out]:

good 638
great 1082
love 7716
awesome 2032

Upvotes: 1

Michał Niklas
Michał Niklas

Reputation: 54342

You have some problems. At first you can create functions that read comments from file and divides comments into words. Make them and check if they work as you want. Then main procedure can look like:

for comment in get_comments(file_name):
    words = get_words(comment)
    classified = False
    # at first look for negative comment
    for neg_word in NEGATIVE_WORDS:
        if neg_word in words:
            classified = True
            negatives.append(comment)
            break
    # now look for positive
    if not classified:
        for pos_word in POSITIVE_WORDS:
            if pos_word in words:
                classified = True
                positives.append(comment)
                break
    if not classified:
        neutral.append(comment)

Upvotes: 0

Dineshs91
Dineshs91

Reputation: 2214

You are not reading the lines from the file. And this line

if pos_words in words:

I think it is checking for the list ["GOOD", "GREAT", "LOVE", "AWESOME"] in words. That is you are looking in the list of words for a list pos_words = ["GOOD", "GREAT", "LOVE", "AWESOME"].

Upvotes: 0

Colin Bernet
Colin Bernet

Reputation: 1394

You could use the code below to count the number of positive and negative words in a paragraph:

from collections import Counter

def readwords( filename ):
    f = open(filename)
    words = [ line.rstrip() for line in f.readlines()]
    return words

# >cat positive.txt 
# good
# awesome
# >cat negative.txt 
# bad
# ugly

positive = readwords('positive.txt')
negative = readwords('negative.txt')

print positive
print negative

paragraph = 'this is really bad and in fact awesome. really awesome.'

count = Counter(paragraph.split())

pos = 0
neg = 0
for key, val in count.iteritems():
    key = key.rstrip('.,?!\n') # removing possible punctuation signs
    if key in positive:
        pos += val
    if key in negative:
        neg += val

print pos, neg

Upvotes: 0

Related Questions