john
john

Reputation: 263

removing some part of a text file in python

I have a very big text file and I want to filter out some lines. the first line is Identifier and it is followed by many lines (numbers in different lines) like this example:

example:

fixedStep ch=GL000219.1 start=52818 step=1
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
fixedStep ch=GL000320.1 start=52959 step=1
1.000000
1.000000
1.000000
fixedStep ch=M start=52959 step=1
1.000000
1.000000

this line is identifier: fixedStep ch=GL000219.1 start=52818 step=1 I want to filter out all identifier lines containing ch=GL000219.1 and ch=GL000320.1 and the following lines (the numbers) and keep other identifiers and the corresponding lines (numbers) below them. each identifier is repeated multiple times. like this output:

fixedStep ch=M start=52959 step=1
1.000000
1.000000

I have tried this code:

l = ["ch=GL000219.1", "ch=GL000320.1"] # since I have more identifiers that should be removed 

with open('file.txt', 'r') as f:
    with open('outfile.txt', 'w') as outfile:
        good_data = True
        for line in f:
            if line.startswith('fixedStep'):
                for i in l:
                    good_data = i not in line
            if good_data:
                outfile.write(line)

my code does not return what I want. do you know how to modify the code?

Upvotes: 0

Views: 96

Answers (2)

gushitong
gushitong

Reputation: 2036

You placed this line in the wrong place:

good_data = True

Once it is set to false, it won't to be true again.

You can write like this:

l = ["ch=GL000219.1", "ch=GL000320.1"]
flag = False                                                                        

with open('file.txt', 'r') as f, open('outfile.txt', 'w') as outfile:                                                                                
    for line in f:                                                                  
        if line.strip().startswith("fixedStep"):                                    
            flag = all(i not in line for i in l)                                    
        if flag:                                                                    
            outfile.write(line)    

Upvotes: 1

JerryLong
JerryLong

Reputation: 79

you need to split strings(the content of the text file)into lines after you read them from a text file . using

print(f)

after read to f, you will find that is a string not lines.

if it's a unix ending text file,using

f = f.split("\n")

to convert string to list, then you can loop it by lines.

Upvotes: 0

Related Questions