Chaos_99
Chaos_99

Reputation: 2304

How to get egrep to match ^ for every line (as it should be)

I have a file containing these lines:

SOME COMMAND 34 XXXXX ;
; a comment which may contain a : 
      sometext001 : X00 : 1 ;
                  : X01 : 1 ;
                  : X11 : 1 ;

And want to retrieve sometext001 with grep/egrep.

Using the regex ^\s*[^:\s;]+\s*:

(in words: starting at the beginning of the line with some or none whitespace, followed by at least one character not a whitespace, colon or semicolon followed again by some or none whitespaces followed by a colon)

I'm able to match the text (including the following :) using an online regex tester http://regexr.com?35eam if I enable multiline support.

I was under the impression that grep/egrep works line by line anyway, so why does the regex not work when used with egrep on a file containing this example?

Is there another way to achive the desired result with egrep or, if that's not possible, with another one-liner callable from a shell script?

Update: although the proposed change of the regex to ^[[:space:]]*[^[:space:];]+[[:space:]]*: matches the lines specified, it it still matches twice in that line, once for sometext001 : and once for X00 : as evident when using the -o option to egrep. How to solve this?

Update: The test file contained exactly the text given above. The command line was egrep -o '^([[:space:]]*[^:[:space:];]+[[:space:]]*:)' test.txt (also tried without the () pair). Output is

      sometext001 :
X00 :        

Upvotes: 1

Views: 514

Answers (3)

Kent
Kent

Reputation: 195169

with gnu grep:

grep -Po '^\s*\K[^\s:;]*(?= :)'

with yourexample:

kent$  echo "SOME COMMAND 34 XXXXX ;
; a comment which may contain a : 
      sometext001 : X00 : 1 ;
                  : X00 : 1 ;
                  : X11 : 1 ;"|grep -Po '^\s*\K[^\s:;]*(?= :)'
sometext001

Upvotes: 0

anubhava
anubhava

Reputation: 785376

You should better use -P (perl like regex switch) with the regex that you have:

grep -P '^\s*[^:\s;]+\s*:'

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336308

egrep uses POSIX EREs by default, and those don't recognize \s and other Perl-style shorthands. Try

^[[:space:]]*[^:[:space:];]+[[:space:]]*:

Upvotes: 2

Related Questions