JeyuLeoChou
JeyuLeoChou

Reputation: 27

Word Counting in lists

I am having a project of which the goal is to count words in a string (unigrams). One of the most obvious ways to approach this is by splitting the string up to lists and then have the program run so it can see if any list items are the same; finally, put the word as the key of a dictionary, and the times of repetition as the key of the dictionary. I did this, but the error message appears of "list indices must be integers or slices, not str". What are some ways to fix this problem (code below).

words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '()','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()

i = 0
counts = dict()

for i in words:
if words[i] in counts:
    counts[words[i]] += 1
else:
    counts[words[i]] =1

sorted_counts = sorted(counts.items(), key=operator.itemgetter(1), reverse=True)
for i in len(range(9)):
    print(count[i])

Upvotes: 0

Views: 67

Answers (1)

Bruno Vermeulen
Bruno Vermeulen

Reputation: 3465

Use collections.Counter (8.3 collections)

import collections
from pprint import pprint

content_string = 'I am having a project of which the goal is to count words in a string (unigrams). One of the most obvious ways to approach this is by splitting the string up to lists and then have the program run so it can see if any list items are the same; finally, put the word as the key of a dictionary, and the times of repetition as the key of the dictionary. I did this, but the error message appears of "list indices must be integers or slices, not str". What are some ways to fix this problem (code below).'

words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(',')','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()

word_count = collections.Counter()
for word in words:
    word_count[word] += 1

pprint(word_count.most_common())

result

[('the', 11),
 ('of', 6),
 ('to', 4),
 ('a', 3),
 ('this', 3),
 ('i', 2),
 ('is', 2),
 ('string', 2),
 ('ways', 2),
 ('and', 2),
 ('list', 2),
 ('are', 2),
 ('as', 2),
 ('key', 2),
 ('dictionary', 2),
 ('am', 1),
 ('having', 1),
 ('project', 1),
 ('which', 1),
 ('goal', 1),
 ('count', 1),
 ('words', 1),
 ('in', 1),
 ('unigrams', 1),
 ('one', 1),
 ('most', 1),
 ('obvious', 1),
 ('approach', 1),
 ('by', 1),
 ('splitting', 1),
 ('up', 1),
 ('lists', 1),
 ('then', 1),
 ('have', 1),
 ('program', 1),
 ('run', 1),
 ('so', 1),
 ('it', 1),
 ('can', 1),
 ('see', 1),
 ('if', 1),
 ('any', 1),
 ('items', 1),
 ('same', 1),
 ('finally', 1),
 ('put', 1),
 ('word', 1),
 ('times', 1),
 ('repetition', 1),
 ('did', 1),
 ('but', 1),
 ('error', 1),
 ('message', 1),
 ('appears', 1),
 ('indices', 1),
 ('must', 1),
 ('be', 1),
 ('integers', 1),
 ('or', 1),
 ('slices', 1),
 ('not', 1),
 ('str', 1),
 ('what', 1),
 ('some', 1),
 ('fix', 1),
 ('problem', 1),
 ('code', 1),
 ('below', 1)]

PS. for i in words: i is actually a word and not an index. If you want an index and the word you can do for i, word in enumerate(words): However as you see using Counter solves the problem in a much shorter way.

Anyway not using Counter you can solve as follows:

from pprint import pprint

content_string = 'I am having a project of which the goal is to count words in a string (unigrams). One of the most obvious ways to approach this is by splitting the string up to lists and then have the program run so it can see if any list items are the same; finally, put the word as the key of a dictionary, and the times of repetition as the key of the dictionary. I did this, but the error message appears of "list indices must be integers or slices, not str". What are some ways to fix this problem (code below).'

words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(',')','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()

word_count = {}

for word in words:
    try:
        word_count[word] += 1
    except KeyError:
        word_count[word] = 1

word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
pprint(word_count)

Upvotes: 2

Related Questions