Reputation: 3
I am trying to get this python code to get rid of punctuation marks associated with words and count the unique words. For some reason it's still counting both "hello." and "hello". Any help would be most appreciated.
def word_distribution(words):
word_dict = {}
words = words.lower()
words = words.split()
for word in words:
if ord('a') <= ord(word[-1]) <= ord('z'):
pass
elif ord('A') <= ord(word[-1]) <= ord('Z'):
pass
else:
word[:-1]
word_dict = {word:words.count(word)+1 for word in set(words)}
return(word_dict)
Upvotes: 0
Views: 86
Reputation: 1680
I don't know why you're adding 1 to count.
def word_distribution(words):
word_dict = {}
words = words.lower().split()
for word in words:
if ord('a') <= ord(word[-1]) <= ord('z'):
pass
elif ord('A') <= ord(word[-1]) <= ord('Z'):
pass
word_dict = {word:words.count(word) for word in set(words)}
return(word_dict)
{'hello': 2, 'my': 1, 'name': 1, 'is': 1}
Edit:
as brianpck, points out:
def word_distribution(words):
word_dict = {}
words = words.lower().split()
word_dict = {word:words.count(word) for word in set(words)}
return(word_dict)
also will give the same result.
Upvotes: 1
Reputation: 1877
There are certainly better way of achieving what you are trying to do but this answer fixes your code.
Strings are immutable and lists are mutable. Nowhere in your code you were modifying the list. and words[-1]
wont have any impact because you were not re assigning it and string are immutable
def word_distribution(words):
word_dict = {}
words = words.lower()
words = words.split()
for word in words:
index = words.index(word)
if ord('a') <= ord(word[-1]) <= ord('z'):
pass
elif ord('A') <= ord(word[-1]) <= ord('Z'):
pass
else:
word = word[:-1]
words[index] = word
word_dict = {word:words.count(word) for word in set(words)}
return(word_dict)
Upvotes: 1
Reputation: 12972
You are making it too complicated, as Sohier Dane mentioned in the comments you can make use of the other post to remove punctuation and simplify the script to:
import string
def word_distribution(words):
words = words.translate(None, string.punctuation).lower()
d = {}
for w in words.split():
if w not in d.keys():
d[w] = 1
else:
d[w] += 1
return d
Results:
>>> x='Hello My Name Is hello.'
>>> print word_distribution(x)
>>> {'is': 1, 'my': 1, 'hello': 2, 'name': 1}
Upvotes: 1