Bhabani Shankar
Bhabani Shankar

Reputation: 1285

Find string between char in unix

I have a basic query. I have a string like below:

on one off abcd on two off

I want to find out all the string between 'on' and 'off' the result I am expecting here is 'one' and 'two'

I believe this is possible with sed..

I tried with sed 's/on\(.*\)off/\1/g' but this returns one off abcd on two

Upvotes: 0

Views: 39

Answers (3)

NeronLeVelu
NeronLeVelu

Reputation: 10039

sed 's/\(.*\) off.*/ \1³/;s/ off /³/g;s/ on /²/g;s/³[^²]*²/³²/g;s/^[^²]*²/²/;s/²/\
/g;s/.//;s/³//g'
  • use ²and ³ as delimiter (because POSIX sed does not allow a group rejection but a class) instead of onand off. Other character not used in the string could be use (avoid maybe meta char like &, ...)
  • other action is to separate external content (remove) and reformat

Upvotes: 0

Jotne
Jotne

Reputation: 41456

Here is an awk version

awk -v RS=" " '/\<off\>/ {f=0} f; /\<on\>/ {f=1}' file
one
two

Upvotes: 0

Wintermute
Wintermute

Reputation: 44043

With sed, I think the easiest way is to use two sed processes:

echo 'on one off abcd on two off' | sed 's/\<on\>[[:space:]]*/\non\n/g; s/[[:space:]]*\<off\>/\noff\n/g' | sed -n '/^on$/,/^off$/ { //!p; }'
one
two

This falls into two parts:

sed 's/\<on\>[[:space:]]*/\non\n/g; s/[[:space:]]*\<off\>/\noff\n/g'

puts the on and off on easily recognizable, single lines, and

sed -n '/^on$/,/^off$/ { //!p; }'

prints just the stuff between them.

Alternatively, you could do it with Perl (which supports non-greedy matching and lookarounds):

$ echo 'on one off abcd on two off' | perl -pe 's/.*?\bon\b\s*(.*?)\s*\boff\b.*?((?=\bon\b)|$)/\1\n/g; s/\n$//'
one
two

Where the

s/.*?\bon\b\s*(.*?)\s*\boff\b.*?((?=\bon\b)|$)/\1\n/g

puts everything between \bon\b and \boff\b (where \b matches word boundaries) on a single line. The main trick is that .*? matches non-greedily, which is to say it matches the shortest string necessary to find a match for the full regex. The (?=\bon\b) is a zero-length lookahead term, so that the .*? matches only before another on delimiter or the end of the line (this is to discard data between off and on).

The

s/\n$//

just removes the last newline that we don't need or want.

Upvotes: 2

Related Questions