Tiago Pereira
Tiago Pereira

Reputation: 1

extracting text blocks from input file with awk or sed and save each block in a separate output file

I am trying to use "awk" to extract text blocks (first field/column only, but multiple lines, the number of lines vary between blocks) based on separators (# and --. These columns represent sequence IDs.

Using "awk" I am able to separate the blocks and print the first column, but I can not redirect these text blocks to separate output files.

Code:

awk '/#/,/--/{print $1}' OTU_test.txt

Ideally, I would like to save each file (text block excluding the separators) based on some text found in the first line of each block (e.g. MEMB.nem.6; MEMB.nem. is content, but the number changes)

Example of input file enter image description here

#OTU_MEMB.nem.6
EF494252.1.2070 6750.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Nucletmycea;D_3__Fungi;D_7__Dothideomycetes;D_8__Capnodiales;D_9__uncultured fungus 1.000
FJ235519.1.1436 5957.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Nucletmycea;D_3__Fungi;D_7__Dothideomycetes;D_8__Capnodiales;D_9__uncultured fungus 1.000
New.ReferenceOTU9219 5418.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Nucletmycea;D_3__Fungi 1.000 
GQ120120.1.1635 471.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Nucletmycea;D_3__Fungi;D_7__Dothideomycetes;D_8__Capnodiales;D_9__uncultured fungus 0.990
--
#OTU_MEMB.nem.163
New.CleanUp.ReferenceOTU59580 12355.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_7__Chromadorea;D_8__Monhysterida 0.700
New.ReferenceOTU11809 1312.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_7__Chromadorea;D_8__Monhysterida 0.770
--
#OTU_MEMB.nem.35
New.CleanUp.ReferenceOTU120578 12116.0 D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_7__Chromadorea;D_8__Desmoscolecida;D_9__Desmoscolex sp. DeCoSp2 0.780

Expected output files (first column only, no separators).

MEMB.nem.6.txt

EF494252.1.2070 
FJ235519.1.1436 
New.ReferenceOTU9219 
GQ120120.1.1635

MEMB.nem.163.txt

New.CleanUp.ReferenceOTU59580
New.ReferenceOTU11809

MEMB.nem.35.txt

New.CleanUp.ReferenceOTU120578

I have searched a lot, but so far I have been unsuccessful. I would be very happy if someone can advice me.

Thanks,

Tiago

Upvotes: 0

Views: 703

Answers (1)

Ed Morton
Ed Morton

Reputation: 204426

awk '
sub(/^#OTU_/,"") {
    close(out)
    out = $0 ".txt"
    next
}
!/^--/ {
    print $1 > out
}
' file

Upvotes: 3

Related Questions