john
john

Reputation: 263

Changing the contents of a text file and making a new file with same format

I have a big text file with a lot of parts. Every part has 4 lines and next part starts immediately after the last part. The first line of each part starts with @, the 2nd line is a sequence of characters, the 3rd line is a + and the 4th line is again a sequence of characters.

Small example:

@M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACGCTTATCGATAAAATTTTGAATTTTGTAACTTGTTTTTGTAATTCTTTAGTTTGTATGTCTGTTGCTATTATGTCTACTATTCTTTCCCCTGCACTGTACCCCCCAATCCCCCCTTTTCTTTTAAAAGTTAACCGATACCGTCGAGATCCGTTCACTAATCGAACGGATCTGTCTCTGTCTCTCTC
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5AEG1EF511F1?GFH3@BFADGD55F?@GFHFGGFCGG/GHGHHHHHHHDBG4E?FB?BGHHHHHHHHHHHHHHHHHFHHHHHHHHHGHGHGHHHHHFHHHHHGGGGHHHHGGGGHHHHHHHGHGHHHHHHFGHCFGGGHGGGGGGGGFGGEGBFGGGGGGGGGFGGGGFFB9/BFFFFFFFFFF/

I want to change the 2nd and the 4th line of each part and make a new file with similar structure (4 lines for each part). In fact I want to keep the 1st 65 characters (in lines 2 and 4) and remove the rest of characters. The expected output for the small example would look like this:

@M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACG
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5A

I wrote the following code:

infile = open("file.fastq", "r")
new_line=[]
for line_number in len(infile.readlines()):
    if line_number ==2 or line_number ==4:
        new_line.append(infile[line_number])

with open('out_file.fastq', 'w') as f:
    for item in new_line:
        f.write("%s\n" % item)

but it does not return what I want. How to fix it to get the expected output?

Upvotes: 1

Views: 56

Answers (3)

ipramusinto
ipramusinto

Reputation: 2648

readlines() will return list of each line in your file. You don't need to prepare a list new_line. Directly iterate over index-value pair of list, then you can modify all the values in your desired position.

By modifying your code, try this

infile = open("file.fastq", "r")
new_lines = infile.readlines()
for i, t in enumerate(new_lines):
    if i == 1 or i == 3:
        new_lines[i] = new_lines[i][:65]

with open('out_file.fastq', 'w') as f:
    for item in new_lines:
        f.write("%s" % item)

Upvotes: 1

smassey
smassey

Reputation: 5931

I think some itertools.cycle could be nice here:

import itertools

with open("transformed.file.fastq", "w+") as output_file:
    with open("file.fastq", "r") as input_file:
        for i in itertools.cycle((1,2,3,4)):
            line = input_file.readline().strip()
            if not line:
                break
            if i in (2,4):
                line = line[:65]
            output_file.write("{}\n".format(line))

Upvotes: 2

jar
jar

Reputation: 2908

This code will achieve what you want -

from itertools import islice
with open('bio.txt', 'r') as infile:
    while True:
        lines_gen = list(islice(infile, 4))
        if not lines_gen:
            break
        a,b,c,d = lines_gen
        b = b[0:65]+'\n'
        d = d[0:65]+'\n'
        with open('mod_bio.txt', 'a+') as f:
            f.write(a+b+c+d)

How it works?
We first make a generator that gives 4 lines at a time as you mention. Then we open the lines into individual lines a,b,c,d and perform string slicing. Eventually we join that string and write it to a new file.

Upvotes: 2

Related Questions