Alex
Alex

Reputation: 23

Use regex to match the first occurrence of a letter in each word

I'm trying to replace the first occurrence of “I” in a word other than at the start of the word with “ee”. I'm using java.

This should change the phrase

INFINITY IS GIANT.

To:

INFeeNITY IS GeeANT.

So far, my code has gone through several revisions. One is:

replaceAll("(?<=[^I*])\\BI", "ee");

That is using lookbehind, I think. Help is very much appreciated! Thanks.

Upvotes: 2

Views: 126

Answers (2)

MT0
MT0

Reputation: 167832

As you stated in the OP \\BI finds the first I character which is not at the start of the word - if the regular expression then matches the rest of the word, using (?:\\B.)* or .*?\\b, then it won't match a second I in the same word.

"INFINITY IS GIANT".replaceAll( "\\BI((?:\\B.)*)", "ee$1");
"INFINITY IS GIANT".replaceAll( "\\BI(.*?\\b)", "ee$1");

Will both result in:

INFeeNITY IS GeeANT

It even works if you have accents in the text:

"IŇFINIŦŶ IS ĞIANŤ".replaceAll( "\\BI((?:\\B.)*)", "ee$1");
"IŇFINIŦŶ IS ĞIANŤ".replaceAll( "\\BI(.*?\\b)", "ee$1");

Both output:

IŇFeeNIŦŶ IS ĞeeANŤ

Alternatively

Using \\b(.(?:\\B.)*?)\\BI can match from the start of the word to the first I:

"INFINITY IS GIANT".replaceAll( "\\b(.(?:\\B.)*?)\\BI", "$1ee");

Outputs:

 INFeeNITY IS GeeANT

Upvotes: 2

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51330

If you don't care about accented letters, this pattern will do the trick:

\b([a-zA-Z][a-hj-zA-HJ-Z]*)[iI]

Replace it with $1ee.

Demo

It matches the first letter of a word (\b[a-zA-Z]) then any numer of letters except I ([a-hj-zA-HJ-Z]*), then I.

If you have to deal with accented letters, the pattern has to change somewhat:

(?<!\p{L})(\p{L}(?:(?![iI])\p{L})*)[iI]

Demo

Here, I used \p{L} which means any Unicode letter, but had to write (?![iI])\p{L} to mean any Unicode letter except I. I also replaced \b with (?<!\p{L}) to make sure I get Unicode support.

Upvotes: 0

Related Questions