Reputation: 128
I want to delete all instances of "aA", "bB" ... "zZ" from an input string.
e.g.
echo "foObar" |
sed -Ee 's/([a-z])\U\1//'
should output "fbar"
But the \U syntax works in the latter half (replacement part) of the sed expression - it fails to resolve in the matching clause.
I'm having difficulty converting the matched character to upper case to reuse in the matching clause.
If anyone could suggest a working regex which can be used in sed (or awk) that would be great.
Scripting solutions in pure shell are ok too (I'm trying to think of solving the problem this way).
Working PCRE (Perl-compatible regular expressions) are ok too but I have no idea how they work so it might be nice if you could provide an explanation to go with your answer.
Unfortunately, I don't have perl or python installed on the machine that I am working with.
Upvotes: 2
Views: 1248
Reputation: 3950
Note: This solution is (unsurprisingly) slow, based on OP's feedback:
"Unfortunately, due to the multiple passes - it makes it rather slow. "
sed
:
echo 'foObar foobAr' | sed -E -e 's/([a-z])([A-Z])/KEYWORD\1\l\2/g' -e 's/KEYWORD(.)\1//g' -e 's/KEYWORD(.)(.)/\1\u\2/g'
gives you: fbar foobAr
Replacement stages explained:
foObar foobAr
-> fKEYWORDoobar fooKEYWORDbar
fKEYWORDoobar fooKEYWORDbar
-> fbar fooKEYWORDbar
fbar fooKEYWORDbar
-> fbar foobAr
¹ In this example I used KEYWORD
for demonstration purposes. A single character or at least shorter character sequence would be better/faster. Just make sure to pick something that cannot possibly ever be in the input.
² The remaining occurances are those where the lowercase-versions of the letters were not identical, so we have to revert them back to their original state
Upvotes: 1
Reputation: 58440
This might work for you (GNU sed):
sed -r 's/aA|bB|cC|dD|eE|fF|gG|hH|iI|jJ|kK|lL|mM|nN|oO|pP|qQ|rR|sS|tT|uU|vV|wW|xX|yY|zZ//g' file
A programmatic solution:
sed 's/[[:lower:]][[:upper:]]/\n&/g;s/\n\(.\)\1//ig;s/\n//g' file
This marks all pairs of lower-case characters followed by an upper-case character with a preceding newline. Then remove altogether such marker and pairs that match by a back reference irrespective of case. Any other newlines are removed thus leaving pairs untouched that are not the same.
Upvotes: 3
Reputation: 60303
There's an easy lex for this,
%option main 8bit
#include <ctype.h>
%%
[[:lower:]][[:upper:]] if ( toupper(yytext[0]) != yytext[1] ) ECHO;
(that's a tab before the #include
, markdown loses those). Just put that in e.g. that.l
and then make that
. Easy-peasy lex's are a nice addition to your toolkit.
Upvotes: 1
Reputation: 785286
Here is a verbose awk
solution as OP doesn't have perl
or python
available:
echo "foObar" |
awk -v ORS= -v FS='' '{
for (i=2; i<=NF; i++) {
if ($(i-1) == tolower($i) && $i ~ /[A-Z]/ && $(i-1) ~ /[a-z]/) {
i++
continue
}
print $(i-1)
}
print $(i-1)
}'
fbar
Upvotes: 2
Reputation: 626936
You may use the following perl solution:
echo "foObar" | perl -pe 's/([a-z])(?!\1)(?i:\1)//g'
See the online demo.
Details
([a-z])
- Group 1: a lowercase ASCII letter(?!\1)
- a negative lookahead that fails the match if the next char is the same as captured with Group 1(?i:\1)
- the same char as captured with Group 1 but in the different case (due to the lookahead before it).The -e
option allows you to define Perl code to be executed by the compiler and the -p
option always prints the contents of $_
each time around the loop. See more here.
Upvotes: 3