Wychh
Wychh

Reputation: 713

Print a Section of Text from a File

I am still learning python and I have an example of a file:

 RDKit          3D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 552 600 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C 7.3071 41.3785 19.7482 0
M  V30 2 C 7.5456 41.3920 21.2703 0
M  V30 3 C 8.3653 40.1559 21.6876 0
M  V30 4 C 9.7001 40.0714 20.9228 0
M  V30 5 C 9.4398 40.0712 19.4042 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 0 1 1 2
M  V30 1 1 1 6
M  V30 2 1 1 10
M  V30 3 1 1 11
M  V30 4 1 2 3
M  V30 END BOND
M  V30 END CTAB
M  END

where I want to print only the information between the following sections:

M  V30 BEGIN ATOM

and:

M  V30 END ATOM

As the number of atoms vary between files, I would like a generic method that can be used. Can anyone please help?

Many thanks.

Upvotes: 0

Views: 92

Answers (5)

Pranay
Pranay

Reputation: 17

You can try the below function:

def extract_lines(filename, start_line, stop_line):
    lines=[]
    with open(filename,'r') as f:
        lines=f.readlines()

    list_of_lines=[line.rstrip('\n') for line in lines]

    start_point=list_of_lines.index(start_line)
    stop_point=list_of_lines.index(stop_line)

    return "\n".join(list_of_lines[i] for i in range(start_point+1,stop_point))

Upvotes: 0

LeKhan9
LeKhan9

Reputation: 1350

In light of trying to keep the separation of logic short and sweet, and the fact that you wanted a portable method:

def print_atoms_from_file(full_file_path):
    with open(full_file_path, 'r') as f:
        start_printing = False
        for line in f:

            if 'BEGIN ATOM' in line:
                start_printing = True
                continue

            if 'END ATOM' in line:
                start_printing = False
                continue

            if start_printing:
                print line

print_atoms_from_file('test_file_name.txt')

Upvotes: 2

popen
popen

Reputation: 53

This is how I'd do it (with csv).

def process_file(f):
    start_found = False
    content = []
    with open(f, 'r') as f_in:
        reader = csv.reader(f_in, delimiter=' ')
        for i, row in enumerate(reader):
            if set(['M', 'V30', 'BEGIN', 'ATOM']).issubset(row):
                start_found = True
                continue
            elif set(['M', 'V30', 'END', 'ATOM']).issubset(row):
                break
            elif start_found:
                content.append(row)
    return content

Upvotes: 1

anon
anon

Reputation: 87

try this:

with open('filename.txt','r') as f:
    ok_to_print = False
    for line in f.readlines():
        line = line.strip # remove whitespaces
        if line == 'M  V30 BEGIN BOND':
            ok_to_print = True
        elif line == 'M  V30 END ATOM':
            ok_to_print = False
        else:
            if ok_to_print:
                print(line)

This will process it line by line as you read the file. For big files where you can't fit it all on memory, this is ideal. For small files, you can read the whole thing into memory and use regular expressions..

import re
data = ''
with open('filename.txt','r') as f:
    data = f.read()
a = re.compile('M  V30 BEGIN BOND(.+?)M  V30 END ATOM',re.I|re.M|re.DOTALL)
results = a.findall(data)
for result in results:
  print(result)

note: None of this code is tested. Just writing it blind.

Upvotes: 0

You can try this:

# Read file contents
with open("file.txt") as file:
    inside = False
    for line in file:
        # Start section of interest
        if line.rstrip() == "M  V30 BEGIN ATOM":
            inside = True
        # End section of interest
        elif line.rstrip() == "M  V30 END ATOM":
            inside = False
        # Inside section of interest
        elif inside:
            print(line.rstrip())
        else:
            pass

Upvotes: 3

Related Questions