octain
octain

Reputation: 964

Python Count how many types of characters in a file

New to python, I am writing a script that is doing a bunch of I/O stuff, one function is suppose to count how many character types which are = [OHCN] are in a file, not how many occurrences. for examples: if a file has "OOOOOHHHHNNN" it would be 3. Here is what I have, is there a better and more efficient way of doing this? One more question, I am doing a lot of file editing in this script, initially I have a few functions that open the files that need to be modified. Would it be more efficient to handle everything in one function (so open the file once and do what I need to do in the file) or have each function open and the files and do its thing then close, then have the other function open and do that thing etc.... again thank you for any help

def ReadFile(xyzfile, inputFile):

     key_atoms = "OHCN"
     s =  open(xyzfile).read()

     atom_count = {ltr: 0 for ltr in key_atoms}

     for char in text:
         if char in key_atoms:
             atom_count[char] += 1
     for key in sorted(atom_count):
        with open(inputFile) as f:
             string1 = "ntyp = 2"
             string2 = "ntyp = ", atom_count[key]
             s = f.read()
             s = s.replace(str(string1), str(string2))

Upvotes: 0

Views: 219

Answers (2)

Jon Clements
Jon Clements

Reputation: 142156

If you're after the unique types of each atom (or character), then we can use a set and find the intersection of that with the characters in a file which we can access without reading the entire file into memory (we use itertools.chain here instead of a nested loop). Also by using the with statement with both files we get an all or nothing approach (if we can't open both xyzfile and input_file - then we shouldn't bother to proceed anyway). Your current code looks like it can be reduced to:

from itertools import chain

with open(xyzfile) as f1, open(input_file) as f2:
    atom_count = len(set('OHCN').intersection(chain.from_iterable(f1)))
    s = f2.read().replace('ntyp = 2', 'nytp = {}'.format(atom_count))

Your replacement could probably be more efficient but it's not specified what s is being used for.

Upvotes: 1

inspectorG4dget
inspectorG4dget

Reputation: 113965

counts = {}
with open(infilepath) as infile:
    for line in infile:
        for char in line:
            if char not in counts:
                counts[char] = 0
            counts[char] += 1

print("There are", len(counts), "different characters in the file")
for key in counts:
    print("There are", counts[key], "occurrences of", key, "in the file")

Upvotes: 0

Related Questions