Rajeev
Rajeev

Reputation: 46899

Count number of words in a file using dictionary comprehension - python

In the following code using dictionary comprehension am trying to count the total number of words with duplicates but this results in {'count': 1, 'words.As': 1, 'said': 1, 'file.\n': 1, 'this': 1, 'text': 1, 'is': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, 'recognize': 1, 'test': 1, 'the': 1, 'repeat': 1, 'before': 1}

I do not see is twice or any of them for the matter what am i doing wrong here?

test_readme.txt

Hi this is some text to recognize the count of words.As said before this is only a test file ,i repeat test file.

with open('test_readme.txt') as f:
   di = { w : di[w]+1 if w in di else 1  for l in f for w in l.split(' ')}
print di

Upvotes: 0

Views: 2155

Answers (5)

Ilja Everilä
Ilja Everilä

Reputation: 52929

Yet another Counter solution, runs the file through in a single call to Counter iteratively using a nested generator expression:

from collections import Counter

with open('test_readme.txt') as f:
    counts = Counter(word for line in f for word in line.strip().split())

And as pointed out already, you can't access a variable in the expression that produces the result to assign, or in other words the intermediate results of an expression. The expression is evaluated first and the store performed on the result. Since dictionary comprehension is a single expression, it is evaluated and the result stored.

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107287

You can't use dictionary comprehension. Because di doesn't change during it's creation and your code will raise a NameError if you didn't defined the dictionary already.

>>> s = """Hi this is some text to recognize the count of words.
... As said before this is only a test file ,i repeat test file."""
>>> 
>>> di = { w : di[w]+1 if w in di else 1 for l in s.split('\n') for w in l.split(' ')}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <dictcomp>
NameError: global name 'di' is not defined

You can use a defaultdict() or Counter() from collections module:

from collections import defaultdict

di = defaultdict(int)
with open('test_readme.txt') as f:
   for line in f:
       for w in line.strip().split():
           di[w]+=1

Demo :

>>> for line in s.split('\n'):
...    for w in line.strip().split():
...            di[w]+=1
... 
>>> di
defaultdict(<type 'int'>, {'count': 1, 'a': 1, 'said': 1, 'words.': 1, 'this': 2, 'text': 1, 'is': 2, 'of': 1, 'some': 1, 'only': 1, ',i': 1, 'to': 1, 'As': 1, 'Hi': 1, 'file': 1, 'recognize': 1, 'test': 2, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})
>>> 

Upvotes: 2

Pallavi_Kalluri
Pallavi_Kalluri

Reputation: 21

A very readable solution would be

Thedict = {}
fo = open('sample.txt')
for line in fo:
    for word in line.split(' '):
        word = word.strip('.').strip()
        if(word in Thedict):
            Thedict[word] = Thedict[word] + 1
        else:
            Thedict[word] = 0

print(Thedict)

considering the sample holds the text

Upvotes: 1

Moses Koledoye
Moses Koledoye

Reputation: 78536

You can't access di while it is being populated.

Instead, simply use a Counter

from collections import Counter

counter = Counter()
with open('test_readme.txt') as f:
    for line in f:
        counter += Counter(line.split())

Upvotes: 2

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

I would use counter but on the whole string:

from collections import Counter

with open('readme.txt') as f:
   s = Counter(f.read().replace('\n', '').split(' '))

#Out[8]: Counter({'this': 2, 'is': 2, 'test': 2, 'count': 1, 'words.As': 1, 'said': 1, 'text': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, '
#recognize': 1, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})

Upvotes: 1

Related Questions