user32474
user32474

Reputation: 337

egrep search for whitespace

I'm trying to use egrep with a regex pattern to match whitespace.

I've used RegEx with Perl and C# before and they both support the pattern \s to search for whitespace. egrep (or at least the version I'm using) does not seem to support this pattern.

In a few articles online I've come across a shorthand [[:space:]], but this does not seem to work. Any help is appreciated.

Using: SunOS 5.10

Upvotes: 22

Views: 63370

Answers (5)

PaulMurrayCbr
PaulMurrayCbr

Reputation: 1260

If you are using bash, then syntax to put a tab in a line is

$'foo\tbar'

I was recently working with sed to do some fixups on a tab-delimited file. Part of the file was:

sed -E -e $'s/\t--QUOTE--/\t"/g'

That argument is parsed by bash, and sed sees a regex with literal tabs.

Upvotes: 3

Aif
Aif

Reputation: 11220

$ cat > file
this line has whitespace
thislinedoesnthave
$ egrep [[:space:]] file 
this line has whitespace

Works under debian.

For Solaris, isn't there an "eselect" like (see gentoo) or alternatives file to set default your egrep version?

Have you tried grep -E, because if the egrep that is on your path is not the good one, maybe grep is.

Upvotes: -3

Jon Ericson
Jon Ericson

Reputation: 21525

I see the same issue on SunOS 5.10. /usr/bin/egrep does not support extended regular expressions.

Try using /usr/xpg4/bin/egrep:

$ echo 'this line has whitespace
thislinedoesnthave' | /usr/xpg4/bin/egrep '[[:space:]]'
this line has whitespace

Another option might be to just use perl:

$ echo 'this line has whitespace
thislinedoesnthave' | perl -ne 'chomp;print "$_\n" if /[[:space:]]/'
this line has whitespace

Upvotes: 25

paxdiablo
paxdiablo

Reputation: 882466

If you're using 'degraded' versions of grep (I quote the term because most UNIX'es I work on still use the original REs, not those fancy ones with "\s" or "[[:space:]]" :-), you can just revert to the lowest form of RE.

For example, if :space: is defined as spaces and tabs, just use:

egrep '[ ^I]' file

That ^I is an actual tab character, not the two characters ^ and I.

This is assuming :space: is defined as tabs and spaces, otherwise adjust the choices within the [] characters.

The advantage of using degraded REs is that they should work on all platforms (at least for ASCII; Unicode or non-English languages may have different rules but I rarely find a need).

Upvotes: 15

Giacomo
Giacomo

Reputation: 11247

Maybe you should protect the pattern with quotes (if bash, or anything equivalent for the shell you are using).

[ and ] may have special meaning for the shell.

Upvotes: 0

Related Questions