simplygades
simplygades

Reputation: 27

sed: Dynamically remove all text columns except positions defined by pattern

By searching and trying (no regex expert), I have managed to process a text output using sed or grep, and extract some lines, formatted this way:

Tree number 280:
1         0.500      1      node_15 6 --> H 1551.code
                     1      node_21 S ==> H node_20
Tree number 281:
1         0.500      1      node_16 S ==> M 1551.code
                     1      node_20 S --> H node_19

Then, using

sed 's/^.\{35\}\(.\{9\}\).*/\1/' infile , I get the desired part, plus some output which I get rid of later (not a problem).

Tree number 280:
 6 --> H 
 S ==> H 
Tree number 281:
 S ==> M
 S --> H

However, the horizontal position of the C --> C pattern may vary from file to file, although it is always aligned. Is there a way to extract the --> or ==> including the single preceeding and following characters, no matter which columns they are found in?

The Tree number # part is not necessary and could be left blank as well, but there has to be a separator of a kind.

UPDATE (alternative approach)

Trying to use grep, I issued

grep -Eo '(([a-zA-Z0-9] -- |[a-zA-Z0-9] ==)> [a-zA-Z0-9]|Changes)' infile.

A sample of my initial file follows, if anyone thinks of a better, more efficient approach, or my use of regex is insane, please comment!

..MISC TEXT...

Character change lists:


Character    CI  Steps                  Changes
----------------------------------------------------------------
1         0.000      1         node_235 H --> S node
                     1         node_123 S ==> 6 1843
                     1         node_126 S ==> H 2461
                     1         node_132 S ==> 6 1863
                     1         node_213 H --> I 1816
                     1         node_213 H --> 8 1820
..CT...

Character change lists:

Character    CI  Steps                  Changes
----------------------------------------------------------------
1         0.000      1         node_165 H --> S node
                     1         node_123 S ==> 6 1843
                     1         node_231 H ==> S 1823
..MISC TEXT...

Upvotes: 0

Views: 139

Answers (1)

user3897784
user3897784

Reputation:

Grep is a bit easier for just extracting the matching regex (if you need different separators you can add them to the list separated by pipes [-|=]

grep -o '. [-|=][-|=]> .' infile

Of if you really want to sed for this, this should do the first part matches only lines that have the pattern and the second part extracts only the matching regex

sed -n '/[--|==]>/{s/.*\(. [=|-][-|=]> .\).*/\1/p}' infile

Upvotes: 1

Related Questions