Reputation: 59
I want to get the number of atoms from a text file. This text file starts with a couple of lines of header, and sometimes it may add some additional info lines, which also start with special characters. A sample text file looks like this:
% site-data vn=3.0
# pos
Ga 0.0000000 0.0000000 0.0000000
As 0.2500000 0.2500000 0.2500000
My approach was counting the total number of lines and lines that start with special characters, so here is my attempt:
def get_atom_number():
count = 0
with open(sitefile,'r') as site:
x = len(site.readlines())
for line in site.readlines():
if '#' in line or '%' in line:
count +=1
return x-count
The problem with this function is, with the x (total number of lines) defined, the counter (num. of lines that start with special chars) returns 0. If I delete that line, it works. Now, I can divide these two into two functions, but I believe this should work fine, and I want to know what I'm doing wrong.
Upvotes: 0
Views: 974
Reputation: 6331
Use readline
instead
def get_atom_number():
count = 0
with open(sitefile,'r') as site:
for line in site.readline():
if '#' not in line and '%' not in line:
count +=1
return count
As mozway's answer, startswith is a better solution, so code can be like this:
from pathlib import Path
from typing import Union
IGNORE = ('#', '%')
def get_atom_number(filename: str = sitefile, ignore_chars: Union[str, tuple] = IGNORE) -> int:
'''Count how many lines in filename that not startswith ignore_chars'''
return len([1 for i in Path(filename).read_text().splitlines() if not i.startswith(ignore_chars)])
Upvotes: 0
Reputation: 260430
if '#' in line or '%' in line:
will check if the characters are anywhere in the line. Use startswith
instead·
if line.startswith(('#', '%')):
Now, regarding the counting method, you can also increase the counter only when the line is not starting with the characters, thus you don't need to know the total number of lines in advance and do not need to consume all the lines:
if not line.startswith(('#', '%')):
counter += 1
Then you can directly print the counter at the end
Full code:
def get_atom_number():
count = 0
with open(sitefile,'r') as site:
for line in site.readlines():
if not line.startswith(('#', '%')):
count +=1
return count
Upvotes: 3
Reputation: 11
The first call site.readlines()
in your code at line 4 move the file cursor to the end. So the second call site.readlines()
at line 5 only get an empty list.
You can try the code below, it saves the result of call site.readlines()
to a variable lines
. I think it will be working to solve your problem.
def get_atom_number():
count = 0
with open(sitefile,'r') as site:
lines = site.readlines()
x = len(lines)
for line in lines:
if '#' in line or '%' in line:
count +=1
return x - count
Upvotes: 0
Reputation: 24691
The problem you're facing is that .readlines()
consumes the entire file when it executes. If you call it again, nothing comes out, since it's already at the end of the file.
The solution is to assign site.readlines()
to a variable, first, and then change the following two lines to refer to that variable. This way, you're only calling it once.
def get_atom_number():
count = 0
with open(sitefile,'r') as site:
lines = site.readlines()
x = len(lines)
for line in lines:
if '#' in line or '%' in line:
count +=1
return x - count
Upvotes: 2