Reputation: 1
I am trying to use "awk" to extract text blocks (first field/column only, but multiple lines, the number of lines vary between blocks) based on separators (# and --. These columns represent sequence IDs.
Using "awk" I am able to separate the blocks and print the first column, but I can not redirect these text blocks to separate output files.
Code:
awk '/#/,/--/{print $1}' OTU_test.txt
Ideally, I would like to save each file (text block excluding the separators) based on some text found in the first line of each block (e.g. MEMB.nem.6; MEMB.nem. is content, but the number changes)
Example of input file enter image description here
#OTU_MEMB.nem.6
EF494252.1.2070 6750.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Nucletmycea;D_3__Fungi;D_7__Dothideomycetes;D_8__Capnodiales;D_9__uncultured fungus 1.000
FJ235519.1.1436 5957.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Nucletmycea;D_3__Fungi;D_7__Dothideomycetes;D_8__Capnodiales;D_9__uncultured fungus 1.000
New.ReferenceOTU9219 5418.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Nucletmycea;D_3__Fungi 1.000
GQ120120.1.1635 471.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Nucletmycea;D_3__Fungi;D_7__Dothideomycetes;D_8__Capnodiales;D_9__uncultured fungus 0.990
--
#OTU_MEMB.nem.163
New.CleanUp.ReferenceOTU59580 12355.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_7__Chromadorea;D_8__Monhysterida 0.700
New.ReferenceOTU11809 1312.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_7__Chromadorea;D_8__Monhysterida 0.770
--
#OTU_MEMB.nem.35
New.CleanUp.ReferenceOTU120578 12116.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_7__Chromadorea;D_8__Desmoscolecida;D_9__Desmoscolex sp. DeCoSp2 0.780
Expected output files (first column only, no separators).
MEMB.nem.6.txt
EF494252.1.2070
FJ235519.1.1436
New.ReferenceOTU9219
GQ120120.1.1635
MEMB.nem.163.txt
New.CleanUp.ReferenceOTU59580
New.ReferenceOTU11809
MEMB.nem.35.txt
New.CleanUp.ReferenceOTU120578
I have searched a lot, but so far I have been unsuccessful. I would be very happy if someone can advice me.
Thanks,
Tiago
Upvotes: 0
Views: 703
Reputation: 204426
awk '
sub(/^#OTU_/,"") {
close(out)
out = $0 ".txt"
next
}
!/^--/ {
print $1 > out
}
' file
Upvotes: 3