Reputation: 46899
In the following code using dictionary comprehension am trying to count the total number of words with duplicates but this results in {'count': 1, 'words.As': 1, 'said': 1, 'file.\n': 1, 'this': 1, 'text': 1, 'is': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, 'recognize': 1, 'test': 1, 'the': 1, 'repeat': 1, 'before': 1}
I do not see is
twice or any of them for the matter what am i doing wrong here?
test_readme.txt
Hi this is some text to recognize the count of words.As said before this is only a test file ,i repeat test file.
with open('test_readme.txt') as f:
di = { w : di[w]+1 if w in di else 1 for l in f for w in l.split(' ')}
print di
Upvotes: 0
Views: 2155
Reputation: 52929
Yet another Counter
solution, runs the file through in a single call to Counter
iteratively using a nested generator expression:
from collections import Counter
with open('test_readme.txt') as f:
counts = Counter(word for line in f for word in line.strip().split())
And as pointed out already, you can't access a variable in the expression that produces the result to assign, or in other words the intermediate results of an expression. The expression is evaluated first and the store performed on the result. Since dictionary comprehension is a single expression, it is evaluated and the result stored.
Upvotes: 1
Reputation: 107287
You can't use dictionary comprehension. Because di
doesn't change during it's creation and your code will raise a NameError
if you didn't defined the dictionary already.
>>> s = """Hi this is some text to recognize the count of words.
... As said before this is only a test file ,i repeat test file."""
>>>
>>> di = { w : di[w]+1 if w in di else 1 for l in s.split('\n') for w in l.split(' ')}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
NameError: global name 'di' is not defined
You can use a defaultdict()
or Counter()
from collections
module:
from collections import defaultdict
di = defaultdict(int)
with open('test_readme.txt') as f:
for line in f:
for w in line.strip().split():
di[w]+=1
Demo :
>>> for line in s.split('\n'):
... for w in line.strip().split():
... di[w]+=1
...
>>> di
defaultdict(<type 'int'>, {'count': 1, 'a': 1, 'said': 1, 'words.': 1, 'this': 2, 'text': 1, 'is': 2, 'of': 1, 'some': 1, 'only': 1, ',i': 1, 'to': 1, 'As': 1, 'Hi': 1, 'file': 1, 'recognize': 1, 'test': 2, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})
>>>
Upvotes: 2
Reputation: 21
A very readable solution would be
Thedict = {}
fo = open('sample.txt')
for line in fo:
for word in line.split(' '):
word = word.strip('.').strip()
if(word in Thedict):
Thedict[word] = Thedict[word] + 1
else:
Thedict[word] = 0
print(Thedict)
considering the sample holds the text
Upvotes: 1
Reputation: 78536
You can't access di
while it is being populated.
Instead, simply use a Counter
from collections import Counter
counter = Counter()
with open('test_readme.txt') as f:
for line in f:
counter += Counter(line.split())
Upvotes: 2
Reputation: 31161
I would use counter but on the whole string:
from collections import Counter
with open('readme.txt') as f:
s = Counter(f.read().replace('\n', '').split(' '))
#Out[8]: Counter({'this': 2, 'is': 2, 'test': 2, 'count': 1, 'words.As': 1, 'said': 1, 'text': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, '
#recognize': 1, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})
Upvotes: 1