Reputation: 911
I am trying to parse a large text document. I am wanting to extract information from this document by using this code,
enumerated_journal = ""
with open('journal.0028.txt', 'r') as file_object:
for line in enumerate(file_object.readlines()):
enumerated_journal += str(line) + "\n"
for line in enumerated_journal.splitlines():
if "jrn." in line.lower() and "username" in line.lower():
print(line)
This code is finding the line in the text document that contains the two strings that I am using as filters. I am wanting to know how to print the lines that precede or follow this line for a set number of lines.
For example, if print(line) returns,
"Username: Christian"
I would like to print the lines before and after this line.
"User Data:"
"Username: Christian"
"Age: 23"
"Location: Texas"
I appreciate the help in advance! Let me know if I need to clarify anything.
Upvotes: 0
Views: 568
Reputation: 77337
You are reading the entire file into memory as a list and enumerating, so all you have to do is use the current index into the list to grab other nearby lines. I changed your compare to a regex to be a little faster and came up with:
import re
with open('journal.0028.txt', 'r') as file_object:
lines = file_object.readlines()
for i, line in enumerate(lines):
if re.search(r"jrn\..*username", line.lower()):
for item in lines[max(i-2, 0):i+3]:
print(item.rstrip())
Upvotes: 1
Reputation: 472
This should get you some part of the way there:
numlinesbefore = 2
numlinesafter = 2
linesbefore = []
for line in read_journal_enumerated('journal.0028.txt').splitlines():
if numlinesbefore = -1:
if numlinesafter > 0:
print(line)
numlinesafter -= numlinesafter
else:
break
if "jrn." and "username" in line.lower():
for bline in linesbefore:
print(bline)
print(line)
numlinesbefore = -1
else:
linesbefore = [line] + linesbefore
if len( linesbefore ) > numlinesbefore:
linesbefore.pop()
Upvotes: 0
Reputation: 43169
You could use the indices instead (using enumerate()
):
indices = [index
for index, line in enumerate(read_journal_enumerated('journal.0028.txt').splitlines())
if "jrn." and "username" in line.lower()]
Afterwards, you could use these to print lines with index + / - x
.
Upvotes: 0