Reputation: 163
I have a file that looks like this:
LOCUS contig_142 11028 bp DNA UNK 07-JUN-2020
DEFINITION .
ACCESSION
VERSION
KEYWORDS .
SOURCE tx-145
ORGANISM tx-145
Unclassified.
COMMENT .
FEATURES Location/Qualifiers
CDS 38..1026
/locus_tag="tx-145_00001"
/transl_table=11
/translation="VRLPQKKQLIHTELLDGLSAKMDFSPYLAEEHNPVQSARPVPRKK
PYQGDVPLEALLEDIKARTKVPAYRLRVRRGKTPGLTDSKIGGLPYWDLSQPYPADEKG
QPMQLLAQINFGAEDMDKPFPKTGLLQFFIGLDEMFGCNFAYAPDQKNYRVVYHPEIDG
SVTPDKVSALGVPGLVNDYRTSPLEAELAIYAEREDSFANDRSFVFEDAFRAAVQAVMG
VDMGEKESYEFLDEDAYDELFESFQETDDGCMNGGHWMLGYPSFTQEDPRPEDSPFDTL
LLQIDSMRDEDGGNPILWGDCGVCNFFIARTDLEKLDFSQVLYNWDCC"
CDS 1255..2219
/locus_tag="tx-145_00002"
/transl_table=11
/translation="MKQRIFITLLLLVLLLASCGQAAQPHAQSEPAATPSEVEKIAFTD
ALGQDFFIDPPQRAVVMIGSFADVWVLAGGEDVLAATANDAWESYALDLPEDTVNIGSP
MKPNVELVLGAQPDLIIASSLSPSNLELQETFQRAGIPAAYFDVSSFQDYLDLLELFTR
LTGRPENYETYGAAVKAQVDGAVDRRVEYSFAPTVLTIQVSGSSVKVKNSEDNVLGPML
KELGCENIADRDGSLLEDLSLEAILQADPDFIFAVYHGTDEAAARANLEESLLSNPAWA
SLSAVEGGRFHILERRMFSLKPNALWGDAYEQLADILCGE"
I would like to use grep/awk/sed to find the locus_tag tx-145_00002
, and if found, I need to retrieve the contig ID, i.e. contig_142
that is several lines before the first match.
Note: I have tried to use grep -B NUMBER_of_lines
, but the number of lines between two matches is not always consistent, and highly variable from sample to sample.
Appreciate your help with this. Thank you!
Sorry about editing this late, but if possible, my expected output should be this:
contig_142
tx-145_00002
Upvotes: 1
Views: 225
Reputation: 133458
Could you please try following. Written and test with shown samples in GNU awk
awk -v valtofind="tx-145_00002" '
/^LOCUS/{
id=$2
next
}
/\/locus_tag/ && $0 ~ "\""valtofind"\"$" {
print id,valtofind
id=""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v valtofind="tx-145_00002" ' ##Starting awk program from here and setting variable vartofind to value which OP wants to look.
/^LOCUS/{ ##Checking condition if a line starts from LOCUS then do following.
id=$2 ##Setting id with $2 of current line.
next ##next will skip all further statements from here.
}
/\/locus_tag/ && $0 ~ "\""valtofind"\"$" { ##Checking condition if line has /locus_tag and variable at the end of line then do following.
print id,valtofind ##Printing id and variable here.
id="" ##Nullifying id here.
}
' Input_file ##Mentioning Input_file name here.
Upvotes: 1
Reputation: 58371
This might work for you (GNU sed):
sed -E '/^LOCUS/h;/locus_tag.*tx-145_00002/!d;x;s/^\S+\s+(\S+).*/\1/' file
On a match with LOCUS
make a copy of this line in the hold space.
On a match of locus_tag
and tx-145_00002
, swap to the copy and extract the id.
Upvotes: 1