Yoda
Yoda

Reputation: 690

Print content in-between matched patterns, but only for the first match

I have an output file containing thousands of lines of information. Every so often I find in the output file information of the following form¨

Input Orientation:
...
content
...
Distance matrix (angstroms):

I now want to print the content and save to filename. However, the above occurs at several places in the output file, and I only want the last entry in the output file. Here's what I've tried so far

tac output | sed -n -e '/Distance matrix/,/Input orientation/p' > filename

However, this prints prints all instances of the matched pattern to filename.

Then I read that with GNU sed, of which I have version 4.2.1 installed, the following should work:

tac output | sed -n -e '0,/Distance matrix/,/Input orientation/p' > filename

But this gives me an error:

sed: -e expression #1, char 20: unknown command: `,'

Then I tried to ask sed to quit after matching pattern Input orientation:

tac output | sed -n -e '/Distance matrix/,/Input orientation/{p;q}' > filename

But now it ends up only printing Distance matrix (angstroms): to filename

I'm sure it if possible, I'm just not able figure it out! I have no experience with awk, so I would prefer answers using sed.

Sample output file for testing:

Item               Value     Threshold  Converged?
             Maximum Force            0.005032     0.000450     NO
             RMS     Force            0.001066     0.000300     NO
             Maximum Displacement     0.027438     0.001800     NO
             RMS     Displacement     0.007282     0.001200     NO
             Predicted change in Energy=-8.909077D-05
             GradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGrad

                                      Input orientation:
             ---------------------------------------------------------------------
             Center     Atomic      Atomic             Coordinates (Angstroms)
             Number     Number       Type             X           Y           Z
             ---------------------------------------------------------------------
                  1          6           0        Incorrect    Incorrect    Incorrect
                  2          1           0        Incorrect    Incorrect    Incorrect
                  3          1           0        Incorrect    Incorrect    Incorrect
                  4          1           0        Incorrect    Incorrect    Incorrect
                  5         17           0        Incorrect    Incorrect    Incorrect
                  6          9           0        Incorrect    Incorrect    Incorrect
             ---------------------------------------------------------------------
                                Distance matrix (angstroms):
                                1          2          3          4          5
                 1  C    0.000000
                 2  H    1.080163   0.000000
                 3  H    1.080326   1.809416   0.000000
                 4  H    1.080621   1.810236   1.810685   0.000000
                 5  Cl   1.962171   2.470702   2.468769   2.465270   0.000000
                 6  F    2.390537   2.343910   2.357275   2.380515   4.352568
                                6
                 6  F    0.000000

                                          Input orientation:
                 ---------------------------------------------------------------------
                 Center     Atomic      Atomic             Coordinates (Angstroms)
                 Number     Number       Type             X           Y           Z
                 ---------------------------------------------------------------------
                      1          6           0        Correct    Correct     Correct
                      2          1           0        Correct    Correct     Correct
                      3          1           0        Correct    Correct     Correct
                      4          1           0        Correct    Correct     Correct
                      5         17           0        Correct    Correct     Correct
                      6          9           0        Correct    Correct     Correct
                 ---------------------------------------------------------------------
                                    Distance matrix (angstroms):
                                    1          2          3          4          5
                     1  C    0.000000
                     2  H    1.080516   0.000000
                     3  H    1.080587   1.801890   0.000000
                     4  H    1.080473   1.801427   1.801478   0.000000
                     5  Cl   1.936014   2.458132   2.459437   2.460630   0.000000
                     6  F    2.414588   2.368281   2.365651   2.355690   4.350586

Upvotes: 2

Views: 173

Answers (4)

potong
potong

Reputation: 58351

This might work for you (GNU sed):

sed '/Input orientation/h;//!H;$!d;x;s/^\(Input orientation.*Distance matrix[^\n]*\).*/\1/p;d' file

At each occurrence of Input orientation overwrite the hold space (HS) with the current line, append following lines and delete all lines. At the end of the file, swap to the HS and remove lines following Distance matrix and print.

Alternative, along the same lines but perhaps less memory intensive:

sed '/Input orientation/h;//!{x;/./G;x};$!d;x;s/\(Distance matrix[^\n]*\).*/\1/p;d' file

Upvotes: 0

karakfa
karakfa

Reputation: 67467

alternative awk without tac

$ awk '/Input orientation/ {f=1} 
                         f {a=a sep $0; sep=ORS} 
         /Distance matrix/ {f=0; b=a; a=sep=""} 
                       END {print b}' file

transfer and reset the cache after each end tag and print the last one.

Upvotes: 0

ctac_
ctac_

Reputation: 2471

Another solution with sed whithout tac

sed ':B;$x;/Input/!d;x;s/.*//;;x;:A;/Distance/!{N;bA};h;N;s/.*\n//;bB' infile

Keep the text in the holdspace and delete it when we find a new one.

Upvotes: 0

Sundeep
Sundeep

Reputation: 23667

That is because, sed would quit as soon as it sees q. You need to qualify it again

$ tac ip.txt | sed -n '/Distance matrix/,/Input orientation/{p;/Input orientation/q}' | tac
                                          Input orientation:
                 ---------------------------------------------------------------------
                 Center     Atomic      Atomic             Coordinates (Angstroms)
                 Number     Number       Type             X           Y           Z
                 ---------------------------------------------------------------------
                      1          6           0        Correct    Correct     Correct
                      2          1           0        Correct    Correct     Correct
                      3          1           0        Correct    Correct     Correct
                      4          1           0        Correct    Correct     Correct
                      5         17           0        Correct    Correct     Correct
                      6          9           0        Correct    Correct     Correct
                 ---------------------------------------------------------------------
                                    Distance matrix (angstroms):


With awk

tac ip.txt | awk '/Distance matrix/{f=1} f; /Input orientation/{exit}' | tac

See also: How to select lines between two patterns?

Upvotes: 1

Related Questions