samar
samar

Reputation: 5201

c# - regex to remove vowels from string except first and last character

I am trying to remove all the vowels from a string except for the first and last character. I have tried with 2 expressions and using 2 ways but in vain. I have described them below. Does anybody has a regular expression for this?

e.g.

original string -- source = apeaple

after regex -- source_modified = apple (this is what is expected)

I tried the expression ([a-zA-Z])[aeiouAEIOU]([a-zA-Z]) but this expression is removing repeated character as well. So the following is happening when i apply the above expression

code used --

Regex reg = new Regex("([a-zA-Z])[aeiouAEIOU]([a-zA-Z])");
string source_modified = reg.Replace(source, "");

original string -- source = apeaple

after code execution -- source_modified = aple (repeating character removed)

code used -- string source_modified = Regex.Replace(source, "([a-zA-Z])[aeiouAEIOU]([a-zA-Z])", "$1" + "$2");

original string -- source = apeaple

after code execution -- source_modified = apaple (just 1 vowel gets removed)

i also tried ([a-zA-Z])[aeiouAEIOU]*([a-zA-Z]) but this is removing just 1 vowel and not all. So the following is happening when i apply the above expression

code used --

Regex reg = new Regex("([a-zA-Z])[aeiouAEIOU]*([a-zA-Z])");
string source_modified = reg.Replace(source, "");

original string -- source = apeaple

after code execution -- source_modified = "" (all characters are removed)

code used -- string source_modified = Regex.Replace(source, "([a-zA-Z])[aeiouAEIOU]*([a-zA-Z])", "$1" + "$2");

original string -- source = apeaple

after code execution -- source_modified = apeple

Upvotes: 3

Views: 7139

Answers (4)

buckley
buckley

Reputation: 14099

You need some lookaround like so

(?<!^)[aouieyAOUIEY](?!$)

C# supports it and it's very powerful

string resultString = null;
try {
    resultString = Regex.Replace(subjectString, "(?<!^)[aeui](?!$)", "");
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Update 1

T.W.R.Cole informs me that there is a special rule in the English language ("this doesn't work for words like "Anyanka" where an inner 'y' is used as a consonant")

The following change should do this, using the technique of negative lookahead:

(?<!^)([aouie]|y(?![aouie]))(?!$)

This time enable the regex modifier that matches case insensitive, it makes the regex simpler than the original

if a y followed by another y still means that the y is a consonant (euh... is there such a word) and thus should not disappear than a y must be listed in the last character class as well :

(?<!^)([aouie]|y(?![aouiey]))(?!$)

I repeat that I used C# as my regex dialect which has good support for lookaround techniques.

Upvotes: 7

Wormbo
Wormbo

Reputation: 4992

In case you ever want to apply that to individual words in strings that consist of more than one word, \B[AEIOUaeiou]\B might be worth a try. \B is a non-word-boundary, i.e. any location where the two adjacent characters are either both word characters or both non-word characters. The latter case is obviously not possible if there's a vowel between the two locations.

Needless to say it also works for strings consisting only of a single word.

Upvotes: 0

Eliot Ball
Eliot Ball

Reputation: 728

You need to start the string with at least one character, find a vowel and then end the string with at least one character. Try:

(.+)[aeiouAEIOU](.+)

Upvotes: 0

Shai
Shai

Reputation: 25619

If so, why not remove the 1st and last character, remove vowels, and then stitch up again?

string sWord = "apeaple";
char cFirst = sWord[0], cLast = sWord[sWord.length-1];

sWord = sWord.substring(1, sWord.length -2);

sWord = cFirst.ToString() + 
        Regex.Replace(sWord , "[aouiyeAOUIYE]", String.Empty) + 
        cLast.ToString();

Upvotes: 7

Related Questions