Jessica
Jessica

Reputation: 3173

write lines to file and remove python

i have a data files with many lines, example:

line1
line2
line3
line4
line5
line6
No D
line7
line8
line9
line10
line11
line12
...

whenever the program sees a line that's 'No D', I want it to write the 4 lines before the 'No D' line and also 2 lines after the 'No D' line to a file called "NoDregion.txt" , and writes what's left to a file called "WithDregion.txt".

my code:

lines =open("file.txt", "r").read().splitlines()
manualintervf = open("NoDregion.txt", "w")
goodlines = open("withDregion.txt", "w")

for i, line in enumerate(lines):
    if ">No D" in line:
        manualintervf.write(lines[i-4]+"\n"+lines[i-3]+"\n"+lines[i-2]+"\n"+lines[i-1]+"\n"+lines[i]+"\n"+lines[i+1]+"\n"+lines[i+2]+"\n")
    else:
        goodlines.write(line+"\n")

The 'NoDregion.txt" (no problem with this):

line3
line4
line5
line6
No D
line7
line8

The'withDregion.txt" (not the desired output, it only removes the 'No D' line and kept everything else, even the 4 lines before the 'No D' and the 2 lines after that line):

line1
line2
line3
line4
line5
line6
line7
line8
line9
line10
line11
line12
...

the desired output for the "WithDregion.txt" would be:

line1
line2
line9
line10
line11
line12
...

I am not sure how to write it so that after the lines are written to the 'NoDregion.txt' they are removed, so that they won't be written to the 'withDregion.txt'

Upvotes: 0

Views: 208

Answers (3)

dawg
dawg

Reputation: 104032

Since you are reading the entire file into memory, you may as well use a regex on the string from the file.

Try:

import re

with open(fn) as f:
    txt=f.read()
    txt=re.sub(r'(?:^.*\s){4}^No D.*$\s(?:^.*\s){2}', '', txt, flags=re.M)
    # now just write txt to the output file...

Or, you can find each element in the list of file lines like so:

with open(fn) as f:
    lines=f.readlines()
    while True:
        try:
            idx=lines.index('No D\n')
            lines=lines[0:idx-4]+lines[idx+2:]
        except ValueError:
            txt=''.join(lines)
            break    

     # again, just write txt out to the output file....

Upvotes: 0

rofls
rofls

Reputation: 5115

Use this instead of your if/else:

for i, line in enumerate(lines):
    if ">No D" in line:
         manualintervf.write(lines[i-4]+"\n"+lines[i-3]+"\n"+lines[i-2]+"\n"+lines[i-1]+"\n"+lines[i]+"\n"+lines[i+1]+"\n"+lines[i+2]+"\n")
         goodlines.write("\n".join(lines[i+1:]))

This uses the join method for strings. Try: "seperator".join(str(i) for i in range(10)) to get a feel for how it works. Basically it takes a list, and joins it together with seperator (or whatever string you want) in between each element.

Upvotes: 0

Silenced Temporarily
Silenced Temporarily

Reputation: 1004

The reason you're seeing the behavior you are is that the lines that are near "No D" do not themselves contain "No D", so they're caught by your "else" clause.

The most straightforward way to do this is with two passes through the lines. Rather than immediately writing each line to a "D region" or "no D region" file, set a "D region" flag, then write the flagged lines to the "D region" and the non-flagged to the "no D region". Something like this:

lines =open("S011_PCR_ABCDEF-AnnotatedVDJAlignments.fasta", "r").read().splitlines()
manualintervf = open("NoDregion.txt", "w")
goodlines = open("withDregion.txt", "w")

d_before = 4
d_after = 3
d_flag = [0] * len(lines)

for i, line in enumerate(lines):
    if "No D" in line:
        dstart = max(i-d_before, 0)
        dend = min(i+d_after, len(lines))
        for j in range(dstart, dend):
          d_flag[j] = 1

for i, line in enumerate(lines):
    if d_flag[i] == 1:
        manualintervf.write(line+"\n")
    else:
        goodlines.write(line+"\n")

Whatever method you use you're going to want to make sure your code doesn't fail if there's a "No D" as the first or last line of your file as well. If you have "No D" as the first line of the file, you'll be writing lines[-3] to the NoDRegion, which is almost certainly not what you want, and if it's the last line you'll be trying to access a nonexistent line which is absolutely not what you want.

Upvotes: 1

Related Questions