Reputation: 1800
I have a large text file which have values separated by a header starting with "#". If the condition matches the one in the header I would like to read the file until the next header "#" and SKIP rest of the file.
To test that I'm trying to read the following text file named as test234.txt:
# abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
# something
njndjen kj
ejkndjke
#vcrvr
The code I wrote is:
file_t = open('test234.txt')
cond = True
while cond:
for line_ in file_t:
print(line_)
if file_t.read(1) == "#":
cond = False
file_t.close()
But, the output I'm getting is:
# abcdefgh
fnrnf
rkfr
foiernfr
erfnr
something
jndjen kj
jkndjke
vcrvr
Instead I would like the output between two headers separated by "#" which is:
1fnrnf
mrkfr
nfoiernfr
nerfnr
How can I do that? Thanks!
EDIT: Reading in file block by block using specified delimiter in python talks about reading file in groups separated by headers but I don't want to read all the headers. I only want to read the header where a given condition is met and as soon as the line reaches the next header marked by '#' it stops reading the file.
Upvotes: 4
Views: 1837
Reputation: 46859
itertools.groupby
can help:
from io import StringIO
from itertools import groupby
text = '''# abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
# something
njndjen kj
ejkndjke
#vcrvr'''
with StringIO(text) as file:
lines = (line.strip() for line in file) # removing trailing '\n'
for key, group in groupby(lines, key=lambda x: x[0]=='#'):
if key is True:
# found a line that starts with '#'
print('found header: {}'.format(next(group)))
if key is False:
# group now contanins all lines that do not start with '#'
print('\n'.join(group))
note that all of this is lazy. you'd only ever have all the items between two headers in memory.
you'd have to replace the with StringIO(text) as file:
with; with open('test234.txt', 'r') as file:
...
the output for your test is:
found header: # abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
found header: # something
njndjen kj
ejkndjke
found header: #vcrvr
UPDATE as i misunderstood. here is a fresh attempt:
from io import StringIO
from collections import deque
from itertools import takewhile
from_line = '# abcdefgh'
to_line = '# something'
with StringIO(text) as file:
lines = (line.strip() for line in file) # removing trailing '\n'
# fast-forward up to from_line
deque(takewhile(lambda x: x != from_line, lines), maxlen=0)
for line in takewhile(lambda x: x != to_line, lines):
print(line)
where i use itertools.takewhile
to get an iterator over the lines until a contition is met (until the first header is found in your case).
the deque
part is just the consume
pattern suggested in the itertools recipes. it just fast-forwards to the point where the given condition does not hold anymore.
Upvotes: 3
Reputation: 11
Learn and use regex. It will help you for all document signification processes.
import re #regex library
with open('test234.txt') as f: #file stream
lines = f.readlines() #reads all lines
p = re.compile('^#.*') #regex pattern creation
for l in lines:
if p.match(l) == None: #looks for non-matching lines
print(l[:-2])
Upvotes: 1