Reputation: 305
I have the following file:
This
is
a
testfile
wj5j keyword 1
WFEWF
O%LWJZ keyword 2
which
should
lpokpij keyword 3
123123das
kpmnvf keyword 4
just
contain
the
following
lines.
from which I need to delete the subsets of lines between keyword 1 & keyword 2 as well as between keyword 3 & keyword 4, hence it would looks like that:
This
is
a
testfile
which
should
just
contain
the
following
lines.
I tried the following which prints only the lines of code containing the keywords, but not those lines in between. My idea was if I got all the lines printed, I could delete them from the file
with open ("newfile_TEST1.txt", mode = "r") as file:
keywords = ['keyword 1', 'keyword 2','keyword 3','keyword 4']
lines = file.readlines()
for lineno, line in enumerate(file,1):
matches = [k for k in keywords if k in line]
if matches:
print(line)
What can I do to improve my code?
Upvotes: 1
Views: 130
Reputation: 1
I have used split function of reindex.
Using which I have splitted the whole string in chunks. I have then picked only chunks with even place value as we are interested in data between 2 keywords. For eg: pair("keyword 1","keyword 2") and pair("keyword 3","keyword 4") etc. There were few empty lines(since we skipped odd place values) so just did rstrip() to remove empty lines.
import re
Lmatches=[]
Loutput=[]
patt=re.compile(r'\b.* keyword [1-4]')
with open("f1.txt","r") as f:
data=f.read()
matches=patt.split(data)
for i in range(len(matches)):
if i%2==0:
Lmatches.append(matches[i])
for elem in Lmatches:
Loutput.append(elem.rstrip())#to remove empty lines
with open("output.txt","w") as wfile:
wfile.writelines(Loutput)
Upvotes: 0
Reputation: 297
It's not really elegant, but you could do something like that :
with open("file.txt", mode="r") as file:
lines = file.readlines()
keywords = ["keyword 1", "keyword 2", "keyword 3", "keyword 4"]
line = 0
to_keep = True
kept = []
while line < len(lines):
has_keyword = any((keyword in lines[line] for keyword in keywords))
if to_keep and not has_keyword:
kept.append(lines[line])
if has_keyword:
to_keep = not to_keep
line += 1
for line in kept:
print(line, end="")
with open("newfile.txt", mode="w") as file:
file.writelines(kept)
Output :
This
is
a
testfile
which
should
just
contain
the
following
lines.
Upvotes: 1
Reputation: 16556
This solution is for huge text files when you don't want to store the whole lines with readlines()
or etc.
keywords = ['keyword 1', 'keyword 2', 'keyword 3', 'keyword 4']
keywords_it = iter(keywords)
pair = (next(keywords_it), next(keywords_it))
write = True
with open("newfile_TEST1.txt") as f:
for line in f:
if not line.rstrip().endswith(pair[0]) and write:
print(line, end='')
elif line.rstrip().endswith(pair[1]):
write = True
try:
pair = (next(keywords_it), next(keywords_it))
except StopIteration:
pass
else:
write = False
output:
This
is
a
testfile
which
should
just
contain
the
following
lines.
The idea is we get a pair of keywords from the keywords
list each time(like ('keyword 1', 'keyword 2')
. While we're iterating over the lines in file, if the line is not ending with the first one, it is a normal line and should printed. If it ends with the first item in the pair, it set the write
flag to False
which means we stop writing.
Now if it ends with the second item in the pair, it means that we can start to write again after this line. So we get the next pair and set the write
flag to True.
Upvotes: 1
Reputation: 111
I would use a flair that is True since the first match until the netx one. then is False:
with open ("./txt.txt", mode = "r") as file:
keywords = ['keyword 1', 'keyword 2','keyword 3','keyword 4']
lines = file.readlines()
glitch_flair=False
for lineno, line in enumerate(lines,1):
matches = [k for k in keywords if k in line]
if not matches and not glitch_flair:
print(line, end='')
elif matches:
glitch_flair=not glitch_flair
Upvotes: 1