lanrete
lanrete

Reputation: 127

Word frequency with dictionary comprehension

I was trying to use a dictionary to count word frequency on a given string. Say:

s = 'I ate an apple a big apple'

I understand the best way to count word frequency is probably to use collections.Counter. But I want to know if I can solve this by using a dictionary comprehension.

My original method(without dictionary comprehension) was

dict = {}
for token in s.split(" "):
    dict[token] = dict.get(token, 0) + 1

and it works fine:

dict
{'I': 1, 'a': 1, 'an': 1, 'apple': 2, 'ate': 1, 'big': 1}

I tried to use a dictionary comprehension to this, like

dict = {}
dict = {token: dict.get(token, 0) + 1 for token in s.split(" ")}

But this didn't work.

dict
{'I': 1, 'a': 1, 'an': 1, 'apple': 1, 'ate': 1, 'big': 1}

What's wrong with the dictionary comprehension? Is it because I used itself inside the comprehension so every time I called dict.get('apple', 0) in the comprehension, I will get 0? However, I don't know how to test this so I am not 100% sure.

P.S. If it makes any difference, I am using python 3.

Upvotes: 2

Views: 2401

Answers (3)

SergiyKolesnikov
SergiyKolesnikov

Reputation: 7815

For your dictionary comprehension to work, you need a reference to the comprehension inside itself. Something like this would work

{token: __me__.get(token, 0) + 1 for token in s.split(" ")}

if there were such thing as '__me__' referencing the comprehension being built. In Python 3 there is no a documented way to do this.

According to this answer, an undocumented "implementation artifact" (on which Python users should not rely) can be used in Python 2.5, 2.6 to write self-referencing list comprehension. Maybe a similar hack exists for dictionary comprehensions in Python 3 too.

Upvotes: 1

Ivan Chaer
Ivan Chaer

Reputation: 7100

You could also use list.count(), as:

s = 'I ate an apple a big apple'

print  {token: s.split().count(token) for token in set(s.split())}

Upvotes: 1

Daniel Roseman
Daniel Roseman

Reputation: 599956

If you go through your code operation by operation, you will see what is wrong.

First you set dict to an empty dict. (As mentioned in the comments, it's a bad idea to use that for your own variable name, but that's not the problem here.)

Secondly, your dict comprehension is evaluated. At this point the name dict still refers to the empty dict. So every time you do dict.get(whatever, 0), it will always get the default.

Finally, your populated dict is reassigned to the name dict, replacing the empty one that was previously there.

Upvotes: 2

Related Questions