Reading file between headers in python

Question

I have a large text file which have values separated by a header starting with "#". If the condition matches the one in the header I would like to read the file until the next header "#" and SKIP rest of the file.

To test that I'm trying to read the following text file named as test234.txt:

# abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
# something
njndjen kj
ejkndjke
#vcrvr

The code I wrote is:

file_t = open('test234.txt')
cond = True
while cond:
    for line_ in file_t:
        print(line_)
        if file_t.read(1) == "#":
            cond = False
file_t.close()

But, the output I'm getting is:

# abcdefgh

fnrnf

rkfr

foiernfr

erfnr

something

jndjen kj

jkndjke

vcrvr

Instead I would like the output between two headers separated by "#" which is:

1fnrnf
mrkfr
nfoiernfr
nerfnr

How can I do that? Thanks!

EDIT: Reading in file block by block using specified delimiter in python talks about reading file in groups separated by headers but I don't want to read all the headers. I only want to read the header where a given condition is met and as soon as the line reaches the next header marked by '#' it stops reading the file.

hiro protagonist · Accepted Answer

itertools.groupby can help:

from io import StringIO
from itertools import groupby

text = '''# abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
# something
njndjen kj
ejkndjke
#vcrvr'''


with StringIO(text) as file:
    lines = (line.strip() for line in file)  # removing trailing '
'
    for key, group in groupby(lines, key=lambda x: x[0]=='#'):

        if key is True:
            # found a line that starts with '#'
            print('found header: {}'.format(next(group)))

        if key is False:
            # group now contanins all lines that do not start with '#'
            print('
'.join(group))

note that all of this is lazy. you'd only ever have all the items between two headers in memory.

you'd have to replace the with StringIO(text) as file: with; with open('test234.txt', 'r') as file:...

the output for your test is:

found header: # abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
found header: # something
njndjen kj
ejkndjke
found header: #vcrvr

UPDATE as i misunderstood. here is a fresh attempt:

from io import StringIO
from collections import deque
from itertools import takewhile

from_line = '# abcdefgh'
to_line = '# something'

with StringIO(text) as file:
    lines = (line.strip() for line in file)  # removing trailing '
'

    # fast-forward up to from_line
    deque(takewhile(lambda x: x != from_line, lines), maxlen=0)

    for line in takewhile(lambda x: x != to_line, lines):
        print(line)

where i use itertools.takewhile to get an iterator over the lines until a contition is met (until the first header is found in your case).

the deque part is just the consume pattern suggested in the itertools recipes. it just fast-forwards to the point where the given condition does not hold anymore.

Reading file between headers in python

Answers (2)

Related Questions