Reputation: 157
I am trying to count every character from a file and put it in a dictionary. But it doesn't quite work, I don't get all characters.
#!/usr/bin/env python
import os,sys
def count_chars(p):
indx = {}
file = open(p)
current = 0
for ch in file.readlines():
c = ch[current:current+1]
if c in indx:
indx[c] = indx[c]+1
else:
indx[c] = 1
current+=1
print indx
if len(sys.argv) > 1:
for e in sys.argv[1:]:
print e, "contains:"
count_chars(e)
else:
print "[#] Usage: ./aufg2.py <filename>"
Upvotes: 1
Views: 254
Reputation: 142146
I've posted this as a comment to @Amber's answer, but will re-iterate here...
To count the occurences of bytes in a file, then generate a small iterator:
with open('file') as fin:
chars = iter(lambda: fin.read(1), '')
counts = Counter(chars)
This way the the underlying buffering from fin
still applies, but it remains more implicit that you're reading one byte at a time (instead of a block size, which the OS will do on its own regardless anyway), it also allows not using update
on the Counter
object, and in effect becomes more of a complete, stand-alone, instruction.
Upvotes: 1
Reputation: 351
Use a defaultdict. Basically, if you try to get a nonexistent item in a defaultdict, it creates the key and calls the 0th argument specified by the constructor to be used as the value.
import collections
def count_chars(p):
d = collections.defaultdict(int)
for letter in open(p).read():
d[letter] += 1
return d
Upvotes: 1
Reputation: 526593
Assuming the file you're counting fits reasonably in memory:
import collections
with open(p) as f:
indx = collections.Counter(f.read())
Otherwise, you can read it bit by bit:
import collections
with open(p) as f:
indx = collections.Counter()
buffer = f.read(1024)
while buffer:
indx.update(buffer)
buffer = f.read(1024)
Upvotes: 7
Reputation: 500357
The main problem is that you only examine (at most!) one character from every line. If you're reading the file line by line, you need to have an inner loop that would iterate over the line's characters.
#!/usr/bin/env python
import os, sys, collections
def count_chars(p):
indx = collections.Counter()
with open(p) as f:
for line in f:
for c in line:
indx[c] += 1
print indx
if len(sys.argv) > 1:
for e in sys.argv[1:]:
print e, "contains:"
count_chars(e)
else:
print "[#] Usage: ./aufg2.py <filename>"
Upvotes: 2