Burton Samograd
Burton Samograd

Reputation: 3638

Grep pattern matching lower case string enclosed in double quotes

I'm having a bit of an issue with grep that I can't seem to figure out. I'm trying to search for all instances of lower case words enclosed in double quotes (C strings) in a set of source files. Using bash and gnu grep:

grep -e '"[a-z]+"' *.cpp

gives me no matches, while

grep -e '"[a-z]*"' *.cpp

gives me matches like "Abc" which is not just lower case characters. What is the proper regular expression to match only "abc"?

Upvotes: 10

Views: 8238

Answers (4)

spookypeanut
spookypeanut

Reputation: 503

If you don't want to mess about with locales, this worked for me:

grep -e '"[[:lower:]]\+"'

Upvotes: 0

user unknown
user unknown

Reputation: 36250

Mask the +

grep -e '"[a-z]\+"' *.cpp

or use egrep:

egrep  '"[a-z]+"' *.cpp

maybe you had -E in mind:

grep -E '"[a-z]+"' *.cpp

The lowercase -e is used, for example, to specify multiple search patterns.

The phaenomenon of uppercase characters might origin from your locale - which you can prevent with:

LC_ALL=C egrep  '"[a-z]+"' *.cpp

Upvotes: 1

Don Stewart
Don Stewart

Reputation: 137987

You're forgetting to escape the meta characters.

grep -e '"[a-z]\+"'

For the second part, the reason it is matching multi-case characters is because of your locale. As follows:

$ echo '"Abc"' | grep -e '"[a-z]\+"'
"Abc"
$ export LC_ALL=C
$ echo '"Abc"' | grep -e '"[a-z]\+"'
$

To get the "ascii-like" behavior, you need to set your locale to "C", as specified in the grep man page:

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

Upvotes: 9

Nathan Fellman
Nathan Fellman

Reputation: 127538

You probably need to escape the +:

grep -e '"[a-z]\+"' *.cpp

Upvotes: 0

Related Questions