neif
neif

Reputation: 510

Python: Counting a specific set of character occurrences in lines of a file

I am struggling with a small program in Python which aims at counting the occurrences of a specific set of characters in the lines of a text file.

As an example, if I want to count '!' and '@' from the following lines

hi!
[email protected]
collection!

I'd expect the following output:

!;2
@;1

So far I got a functional code, but it's inefficient and does not use the potential that Python libraries have. I have tried using collections.counter, with limited success. The efficiency blocker I found is that I couldn't select specific sets of characters on counter.update(), all the rest of the characters found were also counted. Then I would have to filter the characters I am not interested in, which adds another loop... I also considered regular expressions, but I can't see an advantage in this case.

This is the functional code I have right now (the simplest idea I could imagine), which looks for special characters in file's lines. I'd like to see if someone can come up with a neater Python-specific idea:

 def count_special_chars(filename):
      special_chars = list('!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ ')
      dict_count = dict(zip(special_chars, [0] * len(special_chars)))

      with open(filename) as f:
          for passw in f:
              for c in passw:
                  if c in special_chars:
                      dict_count[c] += 1
      return dict_count

thanks for checking

Upvotes: 1

Views: 3726

Answers (4)

Zioalex
Zioalex

Reputation: 4333

I did something like this where you do not need to use the counter library. I used it to count all the special char but you can adapt to put the count in a dict.

import re

def countSpecial(passwd):
    specialcount = 0
    for special in special_chars:
        lenght = 0
        #print special
        lenght = len(re.findall(r'(\%s)' %special , passwd))
        if lenght > 0:
            #print lenght,special
            specialcount = lenght + specialcount
    return specialcount

Upvotes: 1

cocoatomo
cocoatomo

Reputation: 5492

  • need not to process file contents line-by-line
  • to avoid nested loops, which increase complexity of your program
    • If you want to count character occurrences in some string, first, you loop over the entire string to construct an occurrence dict. Then, you can find any occurrence of character from the dict. This reduce complexity of the program.
      • When constructing occurrence dict, defaultdict would help you to initialize count values.

A refactored version of the program is as below:

special_chars = list('!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ ')
dict_count = defaultdict(int)

with open(filename) as f:
    for c in f.read():
        dict_count[c] += 1

for c in special_chars:
    print('{0};{1}'.format(c, dict_count[c]))

ref. defaultdict Examples: https://docs.python.org/3.4/library/collections.html#defaultdict-examples

Upvotes: 1

kojiro
kojiro

Reputation: 77167

Eliminating the extra counts from collections.Counter is probably not significant either way, but if it bothers you, do it during the initial iteration:

from collections import Counter
special_chars = '''!"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ '''
found_chars = [c for c in open(yourfile).read() if c in special_chars]
counted_chars = Counter(found_chars)

Upvotes: 1

zyxue
zyxue

Reputation: 8918

Why not count the whole file all together? You should avoid looping through string for each line of the file. Use string.count instead.

from pprint import pprint

# Better coding style: put constant out of the function
SPECIAL_CHARS = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ '

def count_special_chars(filename):
    with open(filename) as f:
        content = f.read()
        return dict([(i, content.count(i)) for i in SPECIAL_CHARS])

pprint(count_special_chars('example.txt'))

example output:

{' ': 0,
 '!': 2,
 '.': 1,
 '@': 1,
 '[': 0,
 '~': 0
 # the remaining keys with a value of zero are ignored
  ...}

Upvotes: 3

Related Questions