Rocket
Rocket

Reputation: 553

Deleting specific lines from text file

I have a text file:

>E8|E2|E9D
Football is a good game
Its good for health
you can play it every day
>E8|E2|E10D
Sequence unavailable
>E8|E2|EKB
Cricket
>E87|E77|E10D
Sequence unavailable
>E27|E97|E10D
Sequence unavailable
>E8|E2|E9D
Sequence unavailable

I wrote the following code for detecting Sequence unavailable from this file and delete it:

with open('input.txt') as f1, open('output.txt', 'w') as f2,\
                                                  open('temp_file','w') as f3:
    lines = []       # store lines between two `>` in this list
    for line in f1:
        if line.startswith('>'):
            if lines:
                f3.writelines(lines)
                lines = [line]
            else:
                lines.append(line)
        elif line.rstrip('\n') == 'Sequence unavailable':
            f2.writelines(lines + [line])
            lines = []
        else:
            lines.append(line)

    f3.writelines(lines)

os.remove('input.txt')
os.rename('temp_file', 'input.txt')

But what I actually want is that I delete all the available sequences for a given question (last column of the > lines).

For example, even if there are lines following E9D, if there is another entry for E9D with Sequence unavailable no entries should be written to the output file:

input.txt

>E8|E2|E9D
Football is a good game
Its good for health
you can play it every day
>E8|E2|E10D
Sequence unavailable
>E8|E2|EKB
Cricket
>E87|E77|E10D
Sequence unavailable
>E27|E97|E10D
Sequence unavailable
>E8|E2|E9D
Sequence unavailable

output.txt

>E8|E2|EKB
Cricket

Here only the EKB question had entries.

Upvotes: 0

Views: 423

Answers (2)

falsetru
falsetru

Reputation: 368894

def get_name(line):
    return line[1:].rsplit('|', 1)[-1].strip()

with open('input.txt') as f, open('output.txt', 'w') as fout:
    name = ''

    # Phase 1: Find unavailable sequence
    unavailable = set()
    for line in f:
        if line.startswith('>'):
            name = get_name(line)
        else:
            if 'Sequence unavailable' in line:
                unavailable.add(name)

    # Phase 2: Filter avilable sequence
    f.seek(0)
    keep = False
    for line in f:
        if line.startswith('>'):
            name = get_name(line)
            keep = name not in unavailable
        if keep:
            fout.write(line)

Upvotes: 1

Abhishek dot py
Abhishek dot py

Reputation: 939

You can follow a alternative and simpler approach. Instead of deleting the line, you can replace it with ""

import fileinput
import sys

f=open('input.txt')
line = f.readline()
f.close()
words = line.split()
for word in words:
    line = line.replace("Sequence unavailable","")
    line = line.replace("\n","")

I haven't executed this code but I think logic is correct. Please note that you have to use second replace as there will be a new line everytime.

Upvotes: 0

Related Questions