Reputation: 27
By searching and trying (no regex expert), I have managed to process a text output using sed or grep, and extract some lines, formatted this way:
Tree number 280:
1 0.500 1 node_15 6 --> H 1551.code
1 node_21 S ==> H node_20
Tree number 281:
1 0.500 1 node_16 S ==> M 1551.code
1 node_20 S --> H node_19
Then, using
sed 's/^.\{35\}\(.\{9\}\).*/\1/' infile
, I get the desired part, plus some output which I get rid of later (not a problem).
Tree number 280:
6 --> H
S ==> H
Tree number 281:
S ==> M
S --> H
However, the horizontal position of the C --> C
pattern may vary from file to file, although it is always aligned. Is there a way to extract the -->
or ==>
including the single preceeding and following characters, no matter which columns they are found in?
The Tree number #
part is not necessary and could be left blank as well, but there has to be a separator of a kind.
UPDATE (alternative approach)
Trying to use grep
, I issued
grep -Eo '(([a-zA-Z0-9] -- |[a-zA-Z0-9] ==)> [a-zA-Z0-9]|Changes)' infile
.
A sample of my initial file follows, if anyone thinks of a better, more efficient approach, or my use of regex is insane, please comment!
..MISC TEXT...
Character change lists:
Character CI Steps Changes
----------------------------------------------------------------
1 0.000 1 node_235 H --> S node
1 node_123 S ==> 6 1843
1 node_126 S ==> H 2461
1 node_132 S ==> 6 1863
1 node_213 H --> I 1816
1 node_213 H --> 8 1820
..CT...
Character change lists:
Character CI Steps Changes
----------------------------------------------------------------
1 0.000 1 node_165 H --> S node
1 node_123 S ==> 6 1843
1 node_231 H ==> S 1823
..MISC TEXT...
Upvotes: 0
Views: 139
Reputation:
Grep is a bit easier for just extracting the matching regex (if you need different separators you can add them to the list separated by pipes [-|=]
grep -o '. [-|=][-|=]> .' infile
Of if you really want to sed for this, this should do the first part matches only lines that have the pattern and the second part extracts only the matching regex
sed -n '/[--|==]>/{s/.*\(. [=|-][-|=]> .\).*/\1/p}' infile
Upvotes: 1