filtering a complex text file in bash

Question

I have a text file like this:

@M00872:408:000000000-D31AB:1:1102:15653:1337 1:N:0:ATCACG
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTTAAGCA
+
BCCBGGGGGGGGGGHHHHGGGGGGGGGGGGGGGHHHHHHHHH
@M00872:408:000000000-D31AB:1:1102:15388:1343 1:N:0:ATCACG
CGCGACCTCATGAATTTAAGGGCGACCCGCTGAATTTAAGCA
+
CBBBGGGGGGGGGGHHHHGGGGGGGGGGGGGGGHHHHHGHHH

every 4 lines are belong one group and the first line of each group starts with @. the 2nd line of each group is important for me so I would like to filter out the groups based on 2nd line. in fact if this specific sequence "GATCAGACGTGGCGAC" is present in the 2nd line, I want to remove the whole group and make a new file containing other groups. so the result for this example is:

@M00872:408:000000000-D31AB:1:1102:15388:1343 1:N:0:ATCACG
CGCGACCTCATGAATTTAAGGGCGACCCGCTGAATTTAAGCA
+
CBBBGGGGGGGGGGHHHHGGGGGGGGGGGGGGGHHHHHGHHH

I tried the following command but it returns only the 2nd line and only the ones which contain this piece of sequence. but I want the whole group and if the 2nd line does not contain this sequnce.

grep -i GATCAGACGTGGCGAC myfile.txt > output.txt

do you know how to fix it?

filtering a complex text file in bash

Answers (1)

Related Questions