Reputation: 446
I have a large data file which looks like:
//
ID 1.1.1.258
DE 6-hydroxyhexanoate dehydrogenase.
CA 6-hydroxyhexanoate + NAD(+) = 6-oxohexanoate + NADH.
CC -!- Involved in the cyclohexanol degradation pathway in Acinetobacter
CC NCIB 9871.
//
ID 1.1.1.259
DE 3-hydroxypimeloyl-CoA dehydrogenase.
CA 3-hydroxypimeloyl-CoA + NAD(+) = 3-oxopimeloyl-CoA + NADH.
CC -!- Involved in the anaerobic pathway of benzoate degradation in
CC bacteria.
//
ID 1.1.1.260
DE Sulcatone reductase.
CA Sulcatol + NAD(+) = sulcatone + NADH.
CC -!- Studies on the effects of growth-stage and nutrient supply on the
CC stereochemistry of sulcatone reduction in Clostridia pasteurianum,
CC C.tyrobutyricum and Lactobacillus brevis suggest that there may be at
CC least two sulcatone reductases with different stereospecificities.
//
I want to extract sections of this file that contain the work anaerobic
. I specifically want the ID line.
Is there a means to search the file between ID and // to find anaerobic
and print the output to a new file? If the whole section is printed that is fine as I figure I can grep it out after.
Expected out should be either
ID 1.1.1.259
or
ID 1.1.1.259
DE 3-hydroxypimeloyl-CoA dehydrogenase.
CA 3-hydroxypimeloyl-CoA + NAD(+) = 3-oxopimeloyl-CoA + NADH.
CC -!- Involved in the anaerobic pathway of benzoate degradation in
CC bacteria.
//
Upvotes: 2
Views: 1625
Reputation: 7499
For variety, possible GNU sed
solution:
sed -nr ':a; \@(^|\n)//$@! { N; ba }; /anaerobic/p' data
-n
=> suppresses automatic printing of pattern space-r
=> extended regular expressions :a
=> definition of a labelba
=> jumps to the label a
N
=> appends next line to the pattern space\@(^|\n)//$@!
=> matches "sections" that don't end with //
\@(^|\n)//$@! { N; ba }
therefore appends next line to the pattern spaces until it finds the //
section delimiter. /anaerobic/p
then checks if the current section contains anaerobic
and if it does, p
command prints it.
Upvotes: 2
Reputation: 129
it's simple with awk
awk '/anaerobic/' RS='//\n' ORS='\n//' ./file.txt
Upvotes: 3
Reputation: 80
tac file | sed -n '/anaerobic/,$p' | sed -n '/^ID/ {p;q}'
tac **file**
: print file from end to beginning sed -n '/anaerobic/,$p'
: print from first occurrence of anaerobic to the end of file sed -n '/^ID/ {p;q}'
: search for a line starting with ID,
print the first ocurrence onlyUpvotes: 2