Luzius L
Luzius L

Reputation: 157

Count every character from file

I am trying to count every character from a file and put it in a dictionary. But it doesn't quite work, I don't get all characters.

#!/usr/bin/env python
import os,sys

def count_chars(p):
     indx = {}
     file = open(p)

     current = 0
     for ch in file.readlines():
          c = ch[current:current+1]
          if c in indx:
               indx[c] = indx[c]+1
          else:
               indx[c] = 1           
          current+=1
     print indx

if len(sys.argv) > 1:
     for e in sys.argv[1:]:
          print e, "contains:"
          count_chars(e)
else:
     print "[#] Usage: ./aufg2.py <filename>"

Upvotes: 1

Views: 254

Answers (4)

Jon Clements
Jon Clements

Reputation: 142146

I've posted this as a comment to @Amber's answer, but will re-iterate here...

To count the occurences of bytes in a file, then generate a small iterator:

with open('file') as fin:
    chars = iter(lambda: fin.read(1), '')
    counts = Counter(chars)

This way the the underlying buffering from fin still applies, but it remains more implicit that you're reading one byte at a time (instead of a block size, which the OS will do on its own regardless anyway), it also allows not using update on the Counter object, and in effect becomes more of a complete, stand-alone, instruction.

Upvotes: 1

riamse
riamse

Reputation: 351

Use a defaultdict. Basically, if you try to get a nonexistent item in a defaultdict, it creates the key and calls the 0th argument specified by the constructor to be used as the value.

import collections

def count_chars(p):
    d = collections.defaultdict(int)
    for letter in open(p).read():
        d[letter] += 1
    return d

Upvotes: 1

Amber
Amber

Reputation: 526593

Assuming the file you're counting fits reasonably in memory:

import collections
with open(p) as f:
    indx = collections.Counter(f.read())

Otherwise, you can read it bit by bit:

import collections
with open(p) as f:
    indx = collections.Counter()
    buffer = f.read(1024)
    while buffer:
        indx.update(buffer)
        buffer = f.read(1024)

Upvotes: 7

NPE
NPE

Reputation: 500357

The main problem is that you only examine (at most!) one character from every line. If you're reading the file line by line, you need to have an inner loop that would iterate over the line's characters.

#!/usr/bin/env python
import os, sys, collections

def count_chars(p):
     indx = collections.Counter()
     with open(p) as f:
         for line in f:
             for c in line:
                 indx[c] += 1
     print indx

if len(sys.argv) > 1:
     for e in sys.argv[1:]:
          print e, "contains:"
          count_chars(e)
else:
     print "[#] Usage: ./aufg2.py <filename>"

Upvotes: 2

Related Questions