Reputation: 27
I am having a project of which the goal is to count words in a string (unigrams). One of the most obvious ways to approach this is by splitting the string up to lists and then have the program run so it can see if any list items are the same; finally, put the word as the key of a dictionary, and the times of repetition as the key of the dictionary. I did this, but the error message appears of "list indices must be integers or slices, not str". What are some ways to fix this problem (code below).
words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '()','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()
i = 0
counts = dict()
for i in words:
if words[i] in counts:
counts[words[i]] += 1
else:
counts[words[i]] =1
sorted_counts = sorted(counts.items(), key=operator.itemgetter(1), reverse=True)
for i in len(range(9)):
print(count[i])
Upvotes: 0
Views: 67
Reputation: 3465
Use collections.Counter (8.3 collections)
import collections
from pprint import pprint
content_string = 'I am having a project of which the goal is to count words in a string (unigrams). One of the most obvious ways to approach this is by splitting the string up to lists and then have the program run so it can see if any list items are the same; finally, put the word as the key of a dictionary, and the times of repetition as the key of the dictionary. I did this, but the error message appears of "list indices must be integers or slices, not str". What are some ways to fix this problem (code below).'
words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(',')','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()
word_count = collections.Counter()
for word in words:
word_count[word] += 1
pprint(word_count.most_common())
result
[('the', 11),
('of', 6),
('to', 4),
('a', 3),
('this', 3),
('i', 2),
('is', 2),
('string', 2),
('ways', 2),
('and', 2),
('list', 2),
('are', 2),
('as', 2),
('key', 2),
('dictionary', 2),
('am', 1),
('having', 1),
('project', 1),
('which', 1),
('goal', 1),
('count', 1),
('words', 1),
('in', 1),
('unigrams', 1),
('one', 1),
('most', 1),
('obvious', 1),
('approach', 1),
('by', 1),
('splitting', 1),
('up', 1),
('lists', 1),
('then', 1),
('have', 1),
('program', 1),
('run', 1),
('so', 1),
('it', 1),
('can', 1),
('see', 1),
('if', 1),
('any', 1),
('items', 1),
('same', 1),
('finally', 1),
('put', 1),
('word', 1),
('times', 1),
('repetition', 1),
('did', 1),
('but', 1),
('error', 1),
('message', 1),
('appears', 1),
('indices', 1),
('must', 1),
('be', 1),
('integers', 1),
('or', 1),
('slices', 1),
('not', 1),
('str', 1),
('what', 1),
('some', 1),
('fix', 1),
('problem', 1),
('code', 1),
('below', 1)]
PS. for i in words:
i is actually a word and not an index. If you want an index and the word you can do for i, word in enumerate(words):
However as you see using Counter solves the problem in a much shorter way.
Anyway not using Counter you can solve as follows:
from pprint import pprint
content_string = 'I am having a project of which the goal is to count words in a string (unigrams). One of the most obvious ways to approach this is by splitting the string up to lists and then have the program run so it can see if any list items are the same; finally, put the word as the key of a dictionary, and the times of repetition as the key of the dictionary. I did this, but the error message appears of "list indices must be integers or slices, not str". What are some ways to fix this problem (code below).'
words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(',')','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()
word_count = {}
for word in words:
try:
word_count[word] += 1
except KeyError:
word_count[word] = 1
word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
pprint(word_count)
Upvotes: 2