Reputation: 451
I am trying to make a table from a data set that should give me the words in the data set and the number of times they are repeated.
So for example:
dataset : { moon, moon, moon, sun }
table (final result):
('moon') ==> 3
('sun') ==> 1
I thought to use a dictionary and play with the keys, so if during the iteration a word that is already a key is found, don't add it to the dictionary (that should represent the table) but increase the numeric value.
word_table = {}
for word in document.split():
if word in word_table:
word_table[word, somevalue] += 1
else:
word_table[word, somevalue] = 1
The somevalue
is a secondary key that I am storing together with the word. This value can be yes or no. I am mentioning this because I am not sure if this can cause the problem (or better, the selection of the specific key I want to compare).
When I print the whole dictionary, I get a long list of words as keys (the program does not detect repetition) and all 1 in the counters.
Output:
('moon', 'yes') ==> 1
('moon', 'yes') ==> 1
('moon', 'yes') ==> 1
.........
Is there any other approach or data structure that should I use for this particular case? Or it is just the code?
Upvotes: 0
Views: 55
Reputation: 18697
The collections.Counter
is probably what you are looking for:
>>> from collections import Counter
>>> Counter("moon,moon,moon,sun".split(","))
Counter({'moon': 3, 'sun': 1})
Upvotes: 4
Reputation: 77857
I believe that the structure you need is the counter structure from the collections package. You simply feed it your list of words, and you get a dictionary structure filled with the words and counts.
Upvotes: 2
Reputation: 2022
It's fine for your dictionary key to be a tuple (though in your post it's unclear why you want to dot his). All you need to do is make sure that you're checking for that tuple in your if
statement:
word_table = {}
for word in document.split():
if (word,somevalue) in word_table:
word_table[word, somevalue] += 1
else:
word_table[word, somevalue] = 1
Upvotes: 2