Reputation: 3741
Please note that I need this answer in AWK.
How can I remove all lowercase characters from some awk variable? I tried calling gsub:
gsub(/[a-z]+/,"",varName);
Unfortunately, that removes the whole string, as if awk cannot tell the difference between lower and upper case. Is there some regex-fu I can use that I'm not aware of?
EDIT: Confirmed, awk does not see the difference between lowercase and uppercase characters.
Example 1 (will use letter f here for better understanding of results):
varName="CHRFProtocol";
gsub(/[a-z]/,"f",varName);
Result: ffffffffffff
Example 2 (again, will use letter f here for better understanding of results):
varName="CHRFProtocol";
gsub(/[A-Z]/,"f",varName);
Result: ffffffffffff
Is this legitimate? What's doing on?
Upvotes: 3
Views: 4044
Reputation: 203684
You should just be using the POSIX character class [[:lower:]], not [a-z]:
gsub(/[[:lower:]]/,"",varName)
The latter is locale-dependent, the former is not.
It seems like there's some confusion over when to use POSIX character classes vs when/how to set locale so:
1) Always use POSIX character classes when they exist for the character set you're interested in (e.g. [:digit:], [:lower:], [:punct:], etc., etc.)
2) Otherwise, set LC_ALL=C IF you're OK with how that affects your other settings (e.g. comma vs period as the thousands separator)
3) Otherwise, set LC_COLLATE=C.
See http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html and http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html for more info on character classes and locale variables.
Upvotes: 3
Reputation: 95267
Your locale settings are getting in the way. Try this:
LC_ALL=C awk 'BEGIN {
varName="CHRFProtocol";
gsub(/[a-z]/,"f",varName);
print(varName); }'
GNU awk honors locale settings, and in most national locales on Linux, regular expressions are case-insensitive. Resetting the locale to C
(=POSIX
) for the duration of the awk
command restores case-sensitivity.
Upvotes: 5
Reputation: 195109
example explains everything:
kent$ awk 'BEGIN{var="AaBbCcDDDdddEEEeee";print "before:"var;gsub(/[a-z]/,"",var);print "after:"var}'
before:AaBbCcDDDdddEEEeee
after:ABCDDDEEE
Upvotes: 1
Reputation: 785276
To remove all lowercase characters in awk, use :
gsub(/[a-z]+/, "", varName);
You're actually replacing 1 or more occurrence of lowercase alphabets with literal string "f"
EDIT After you've corrected your question:
Note that if your varName
only contains lowercase alphabets or is already empty then you will get an empty string in varName
.
Upvotes: 1