user175924
user175924

Reputation: 59

Counting the total number of lines except the ones that start with a special character

I want to get the number of atoms from a text file. This text file starts with a couple of lines of header, and sometimes it may add some additional info lines, which also start with special characters. A sample text file looks like this:

% site-data vn=3.0
#                        pos
Ga        0.0000000   0.0000000   0.0000000
As        0.2500000   0.2500000   0.2500000 

My approach was counting the total number of lines and lines that start with special characters, so here is my attempt:

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        x = len(site.readlines())
        for line in site.readlines():
            if '#' in line or '%' in line:
                count +=1
return x-count

The problem with this function is, with the x (total number of lines) defined, the counter (num. of lines that start with special chars) returns 0. If I delete that line, it works. Now, I can divide these two into two functions, but I believe this should work fine, and I want to know what I'm doing wrong.

Upvotes: 0

Views: 974

Answers (4)

Waket Zheng
Waket Zheng

Reputation: 6331

Use readline instead

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        for line in site.readline():
            if '#' not in line and '%' not in line:
                count +=1
    return count

As mozway's answer, startswith is a better solution, so code can be like this:

from pathlib import Path
from typing import Union

IGNORE = ('#', '%')

def get_atom_number(filename: str = sitefile, ignore_chars: Union[str, tuple] = IGNORE) -> int:
    '''Count how many lines in filename that not startswith ignore_chars'''
    return len([1 for i in Path(filename).read_text().splitlines() if not i.startswith(ignore_chars)])

Upvotes: 0

mozway
mozway

Reputation: 260430

if '#' in line or '%' in line: will check if the characters are anywhere in the line. Use startswith instead·

if line.startswith(('#', '%')):

Now, regarding the counting method, you can also increase the counter only when the line is not starting with the characters, thus you don't need to know the total number of lines in advance and do not need to consume all the lines:

if not line.startswith(('#', '%')):
    counter += 1

Then you can directly print the counter at the end

Full code:

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        for line in site.readlines():
            if not line.startswith(('#', '%')):
                count +=1
    return count

Upvotes: 3

imentu
imentu

Reputation: 11

The first call site.readlines() in your code at line 4 move the file cursor to the end. So the second call site.readlines() at line 5 only get an empty list. You can try the code below, it saves the result of call site.readlines() to a variable lines. I think it will be working to solve your problem.

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        lines = site.readlines()
        x = len(lines)
        for line in lines:
            if '#' in line or '%' in line:
                count +=1
    return x - count

Upvotes: 0

Green Cloak Guy
Green Cloak Guy

Reputation: 24691

The problem you're facing is that .readlines() consumes the entire file when it executes. If you call it again, nothing comes out, since it's already at the end of the file.

The solution is to assign site.readlines() to a variable, first, and then change the following two lines to refer to that variable. This way, you're only calling it once.

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        lines = site.readlines()
        x = len(lines)
        for line in lines:
            if '#' in line or '%' in line:
                count +=1
    return x - count

Upvotes: 2

Related Questions