Reputation: 29
I have a fastq file in which the first 8 lines look like this:
@SRR21388627.2845086/1
GCTGCAGTTGCTGCTGTTGCTGCTGCTGGGGCAGCACACCAGGATGGCCGGCGCCCCCG
+
FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FF
@SRR21388627.2707233/1
GCTGCAGTTGCTGCTGTTGCTGCTGCTGGGGCAGCACACCAGGATGGCCGGCGCCCCCG
+
FFFF:FF,:FFFF,FF,F:FFFFF:,F,,:,FF:,:,FFF:::F:,,FFF:::,FF:::
I would like to take the DNA sequence and save each sequence as a new file named with the line before the sequence, such as "SRR21388627.2845086.1.fq", where the @
is removed, and the /
is replaced with .
So far I came up with a command with reference from this post, which doesn't work yet, because I am not sure how to remove @
and replace /
in awk
cat deltaQ_1_region_1.fq | paste - - - - | cut -f1,2 |
awk -F'\t' '$1!=prev{close(out); out=$1".fq"; prev=$1} {sub(/[^\t]+\t/,""); print > out}' file
The expected output for SRR21388627.2845086.1.fq is:
GCTGCAGTTGCTGCTGTTGCTGCTGCTGGGGCAGCACACCAGGATGGCCGGCGCCCCCG
Upvotes: 1
Views: 233
Reputation: 36700
Note: this answer show how to get desired result with minimal alteration of existing code. Generally it is not necessary to use cut
and awk
in one pipeline as 2nd can easily handle columns and often cut part might be eliminated, as shown in other answers.
how to remove
@
and replace/
inawk
sub
String Function can be used for that purpose. As you store filename in variable named out
, you should pass that as 3rd (optional) argument to sub
functions so it changes out
, namely replace this action
{close(out); out=$1".fq"; prev=$1}
using
{close(out); out=$1".fq"; sub(/@/,"",out); sub(/\//,".",out); prev=$1}
Upvotes: 2
Reputation: 17216
You have 4 lines per "record" so you can use a modulo 4 to differentiate the "fields" in awk
(instead of joining them with a paste
before calling awk):
awk '
NR%4 == 1 { sub("/","."); file_out = substr($0,2) ".fq"; next }
NR%4 == 2 { print > file_out; close(file_out) }
' file.fq
Upvotes: 4
Reputation: 35256
Assuming the number of lines that make up a 'DNA sequence' could vary:
awk '
/^@/ { close(outfile) # close previous output file
gsub(/@/,"") # remove all "@" characters from current line
gsub(/[/]/,".") # replace all "/" with "." on current line
outfile = $0 ".fq" # define new output file name
next # skip to next line of input
}
{ print > outfile } # write current line to output file
' deltaQ_1_region_1.fq
Results:
$ head SRR*fq
==> SRR21388627.27072331.fq <==
GCTGCAGTTGCTGCTGTTGCTGCTGCTGGGGCAGCACACCAGGATGGCCGGCGCCCCCG
+
FFFF:FF,:FFFF,FF,F:FFFFF:,F,,:,FF:,:,FFF:::F:,,FFF:::,FF:::
==> SRR21388627.28450861.fq <==
GCTGCAGTTGCTGCTGTTGCTGCTGCTGGGGCAGCACACCAGGATGGCCGGCGCCCCCG
+
FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FF
Upvotes: 3