Reputation: 713
I am still learning python and I have an example of a file:
RDKit 3D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 552 600 0 0 0
M V30 BEGIN ATOM
M V30 1 C 7.3071 41.3785 19.7482 0
M V30 2 C 7.5456 41.3920 21.2703 0
M V30 3 C 8.3653 40.1559 21.6876 0
M V30 4 C 9.7001 40.0714 20.9228 0
M V30 5 C 9.4398 40.0712 19.4042 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 0 1 1 2
M V30 1 1 1 6
M V30 2 1 1 10
M V30 3 1 1 11
M V30 4 1 2 3
M V30 END BOND
M V30 END CTAB
M END
where I want to print only the information between the following sections:
M V30 BEGIN ATOM
and:
M V30 END ATOM
As the number of atoms vary between files, I would like a generic method that can be used. Can anyone please help?
Many thanks.
Upvotes: 0
Views: 92
Reputation: 17
You can try the below function:
def extract_lines(filename, start_line, stop_line):
lines=[]
with open(filename,'r') as f:
lines=f.readlines()
list_of_lines=[line.rstrip('\n') for line in lines]
start_point=list_of_lines.index(start_line)
stop_point=list_of_lines.index(stop_line)
return "\n".join(list_of_lines[i] for i in range(start_point+1,stop_point))
Upvotes: 0
Reputation: 1350
In light of trying to keep the separation of logic short and sweet, and the fact that you wanted a portable method:
def print_atoms_from_file(full_file_path):
with open(full_file_path, 'r') as f:
start_printing = False
for line in f:
if 'BEGIN ATOM' in line:
start_printing = True
continue
if 'END ATOM' in line:
start_printing = False
continue
if start_printing:
print line
print_atoms_from_file('test_file_name.txt')
Upvotes: 2
Reputation: 53
This is how I'd do it (with csv).
def process_file(f):
start_found = False
content = []
with open(f, 'r') as f_in:
reader = csv.reader(f_in, delimiter=' ')
for i, row in enumerate(reader):
if set(['M', 'V30', 'BEGIN', 'ATOM']).issubset(row):
start_found = True
continue
elif set(['M', 'V30', 'END', 'ATOM']).issubset(row):
break
elif start_found:
content.append(row)
return content
Upvotes: 1
Reputation: 87
try this:
with open('filename.txt','r') as f:
ok_to_print = False
for line in f.readlines():
line = line.strip # remove whitespaces
if line == 'M V30 BEGIN BOND':
ok_to_print = True
elif line == 'M V30 END ATOM':
ok_to_print = False
else:
if ok_to_print:
print(line)
This will process it line by line as you read the file. For big files where you can't fit it all on memory, this is ideal. For small files, you can read the whole thing into memory and use regular expressions..
import re
data = ''
with open('filename.txt','r') as f:
data = f.read()
a = re.compile('M V30 BEGIN BOND(.+?)M V30 END ATOM',re.I|re.M|re.DOTALL)
results = a.findall(data)
for result in results:
print(result)
note: None of this code is tested. Just writing it blind.
Upvotes: 0
Reputation: 3001
You can try this:
# Read file contents
with open("file.txt") as file:
inside = False
for line in file:
# Start section of interest
if line.rstrip() == "M V30 BEGIN ATOM":
inside = True
# End section of interest
elif line.rstrip() == "M V30 END ATOM":
inside = False
# Inside section of interest
elif inside:
print(line.rstrip())
else:
pass
Upvotes: 3