Kevin Peralta
Kevin Peralta

Reputation: 13

Separate words into list, except for symbols

I'm creating a project where I'll receive a list of tweets (Twitter), and then check if there words inside of a dictionary, which has words that certain values. I've gotten my code to take the words, but I don't know how to eliminate the symbols like: , . ":

Here's the code:

def getTweet(tweet, dictionary):
score = 0
seperate = tweet.split(' ')
print seperate
print "------"    
if(len(tweet) > 0):
    for item in seperate:
        if item in dictionary:
            print item
            score = score + int(dictionary[item])
    print "here's the score: " + str(score)
    return score
else:
    print "you haven't tweeted a tweet"
    return 0

Here's the parameter/tweet that will be checked:

getTweet("you are the best loyal friendly happy cool nice", scoresDict)

Any ideas?

Upvotes: 0

Views: 119

Answers (2)

darmat
darmat

Reputation: 728

If you want to get rid of all the non alphanumerical values you can try:

import re
re.sub(r'[^\w]', ' ', string)

the flag [^\w] will do the trick for you!

Upvotes: 1

tom10
tom10

Reputation: 69192

Before doing the split, replace the characters with spaces, and then split on the spaces.

import re

line = '  a.,b"c'
line = re.sub('[,."]', ' ', line)

print line  # '  a  b c'

Upvotes: 0

Related Questions