Reputation: 263
I have a very big text file and I want to filter out some lines. the first line is Identifier and it is followed by many lines (numbers in different lines) like this example:
example:
fixedStep ch=GL000219.1 start=52818 step=1
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
fixedStep ch=GL000320.1 start=52959 step=1
1.000000
1.000000
1.000000
fixedStep ch=M start=52959 step=1
1.000000
1.000000
this line is identifier: fixedStep ch=GL000219.1 start=52818 step=1
I want to filter out all identifier lines containing ch=GL000219.1
and ch=GL000320.1
and the following lines (the numbers) and keep other identifiers and the corresponding lines (numbers) below them. each identifier is repeated multiple times.
like this output:
fixedStep ch=M start=52959 step=1
1.000000
1.000000
I have tried this code:
l = ["ch=GL000219.1", "ch=GL000320.1"] # since I have more identifiers that should be removed
with open('file.txt', 'r') as f:
with open('outfile.txt', 'w') as outfile:
good_data = True
for line in f:
if line.startswith('fixedStep'):
for i in l:
good_data = i not in line
if good_data:
outfile.write(line)
my code does not return what I want. do you know how to modify the code?
Upvotes: 0
Views: 96
Reputation: 2036
You placed this line in the wrong place:
good_data = True
Once it is set to false, it won't to be true again.
You can write like this:
l = ["ch=GL000219.1", "ch=GL000320.1"]
flag = False
with open('file.txt', 'r') as f, open('outfile.txt', 'w') as outfile:
for line in f:
if line.strip().startswith("fixedStep"):
flag = all(i not in line for i in l)
if flag:
outfile.write(line)
Upvotes: 1
Reputation: 79
you need to split strings(the content of the text file)into lines after you read them from a text file . using
print(f)
after read to f, you will find that is a string not lines.
if it's a unix ending text file,using
f = f.split("\n")
to convert string to list, then you can loop it by lines.
Upvotes: 0